Skip to main content
Back to Blog

Which TTS Service Actually Lets You Turn a PDF into a Private, Chaptered Podcast?

Headline

Which TTS Service Actually Lets You Turn a PDF into a Private, Chaptered Podcast?

Lead

If you need a commuter-ready episode from a PDF and you care about privacy, choice matters. Some TTS services log text and audio. Some offer enterprise zero-retention. Others only work on-device.

Comparison Snapshot

  • ElevenLabs (cloud): enterprise Zero Retention Mode; API flag enable_logging=false; strong voice options for podcast tone; enterprise feature, not a free toggle. Zero Retention ModePrivacy policy
  • Google Cloud Text-to-Speech (cloud): vendor states it does not log customer TTS text or audio; good for automated batch jobs and regional endpoints. Data logging
  • Amazon Polly (cloud/AWS): enterprise-grade controls and encryption; docs warn diagnostic logs may capture free-form inputs so avoid sending highly sensitive identifiers. Data protection
  • Coqui TTS (local/open-source): fully local install and inference; best option when you must avoid any cloud upload. Installation docs
  • OpenAI (cloud): provides documented data controls and enterprise DPAs; check your contract for retention, logging, and workspace policies. Data controls guide

Deep dive: a commuter-ready scenario

Scenario: You have a 60‑page PDF (board report, paper, or textbook chapter). You want a single audio file with chapter markers and no copy of the text retained by the vendor.

ElevenLabs (what it gives you)

  • How it helps: ElevenLabs supports a Zero Retention Mode for TTS that restricts logging and deletes most request/response data immediately. It’s available to enterprise customers and is enabled via API parameters (example: enable_logging=false) so generated audio and input text aren’t stored long‑term in the request history. ElevenLabs Zero Retention Mode.
  • Limitations: Zero Retention is an enterprise feature and may be restricted for higher‑risk use cases. Confirm access and auditing before batching sensitive files.

Google Cloud Text-to-Speech (what it gives you)

  • How it helps: Google’s Cloud TTS documentation says it does not log customer Cloud TTS text or audio data, and supports regional endpoints for residency requirements — a useful uncluttered path if you run batch conversions via a GCP project. Google Cloud TTS data logging.
  • Limitations: You still operate under Google Cloud IAM and billing; get a DPA if your organization needs contractual guarantees.

Amazon Polly (what it gives you)

  • How it helps: Polly runs under AWS’s compliance and encryption framework. You control IAM, encryption, region, and logging configuration.
  • What to watch for: AWS docs explicitly recommend against sending sensitive identifiers in free‑form fields because diagnostic logs can capture input text. Treat Polly as a secure platform but follow the shared responsibility model. Amazon Polly data protection.

Coqui TTS (what it gives you)

  • How it helps: Coqui is installable and runnable locally. No network calls. If you must guarantee zero upload, run OCR and TTS on the same machine and you never touch a cloud vendor. Coqui installation.
  • Limitations: Local inference needs CPU/GPU resources and more engineering to stitch chapters and polish audio.

OpenAI (data controls)

  • How it helps: OpenAI documents data controls and business agreements; enterprise workspaces can have custom retention and policies. Check the developer docs and your DPA for exact retention details. OpenAI data controls.

Practical chaptering and export (what vendors do and don’t)

  • Most TTS APIs deliver audio per request. They rarely emit container-level chapter metadata (ID3 chapter frames or M4B chapters) as part of the API response. The practical pipeline that works across vendors is:
  1. Split the PDF into headings/sections (use a parser or an OCR step for scanned pages).
  2. Generate per-section audio files (one API call per section). This yields clear boundaries and smaller retries.
  3. Stitch files offline into a single M4B or MP4 and add chapter metadata with a dedicated tool (m4b-tool, ffmpeg + chapter metadata). This keeps control local even when you used cloud TTS for speech quality.

Cost and throttling notes

  • Cloud TTS services bill per character or per second of generated audio. Enterprise zero‑retention may be priced differently and can require sales contact. Always test a representative page count to estimate cost before batch runs.

FAQ

Can I get legally binding zero‑retention for TTS outputs?

Yes — many vendors offer enterprise controls or DPAs that specify retention. ElevenLabs exposes a Zero Retention Mode to enterprise customers Zero Retention Mode. Google Cloud’s TTS docs state that Cloud TTS does not log TTS text/audio Google Cloud TTS data logging.

Which option is simplest for a non‑technical commuter user?

Use a consumer app that exports MP3/M4A. If you need privacy guarantees, pick a cloud TTS provider with explicit non‑logging statements or run a local TTS like Coqui if you can install it Coqui installation.

Can I make a single MP3 with clickable chapters from these APIs?

APIs usually return audio blobs. Build chaptered files by generating per‑heading audio and merging them locally with an M4B/ID3 tool. Cloud APIs rarely return container chapter metadata directly.

Should I trust a vendor’s “we don’t retain data” claim?

Trust, but verify. Ask for contractual terms (DPA), test request history in a trial account, and inspect billing and logs. For enterprise‑grade assurance, use features explicitly labeled Zero Retention or documented non‑logging policies ElevenLabsGoogle Cloud TTS.

Sources