Skip to main content
Back to Blog

From PDF to Chaptered MP3: A Zotero‑First Workflow for Researchers

The problem

You want to listen to papers. Not as one long blob. You want chapters. You want your highlights and linkbacks with the audio. You want a private, repeatable workflow.

Tools promise this. But they rarely connect. Zotero holds PDFs and annotations. TTS services produce natural speech. Audiobook containers can hold chapters. The glue is practical: export, chunk, synthesize, stitch.

This guide shows a working, tool-agnostic pipeline. It uses Zotero for notes, a modern TTS API for narration, and ffmpeg-style metadata to create chaptered M4B/MP3 files you can load on any player.

What the pieces can do (quick facts)

  • Zotero can export notes to Markdown. The Zotero team added Markdown export in beta (notes export feature in the Zotero beta supports Markdown output). That gives you a simple way to turn annotations into files you can process programmatically.
  • ElevenLabs’ Studio API exposes chapters. The API has endpoints to list and stream chapter snapshots for a Studio project, letting you manage chaptered audio server-side.
  • Google Cloud Text‑to‑Speech supports long‑form audio synthesis. The Long Audio Synthesis endpoint accepts large inputs (the doc notes a 1MB input limit for long synthesis) and outputs audio to Cloud Storage.
  • FFmpeg (and simple helper scripts) can add chapter metadata to MP4/M4B containers via an FFMETADATA file. The technique is to write [CHAPTER] blocks with START/END timestamps and then remux the audio with that metadata.

Sources: Zotero forum notes, ElevenLabs API docs, Google Cloud long‑audio docs, and practical ffmpeg chapter guides.

The end‑result in one line

A folder of PDFs becomes: (a) exported Markdown notes and highlights; (b) a set of chapter texts; (c) TTS-generated audio per chapter; (d) a single M4B/MP3 with embedded chapter markers and optional linkbacks to Zotero.

Step‑by‑step pipeline (practical)

1) Export highlights and notes from Zotero

  • In Zotero, use the built‑in Markdown export for notes (or a maintained plugin that exports annotations). That gives you per‑item notes and inline highlights in plain Markdown.
  • Why this matters: you now have structured text and linkable references (zotero:// links in exported Markdown), which you can include in chapter descriptions or show in your listening app’s notes pane.

2) Turn each document into chapterable text

  • Split the exported Markdown by obvious anchors: top‑level headings, ".Abstract", "Introduction", or manually flagged headings from your notes.
  • The simplest automation: a short script that reads the Markdown, finds H1/H2 headings, and writes one text file per chapter with a canonical title.

3) Synthesize audio per chapter with a TTS API

  • Two solid options: ElevenLabs (Studio) or Google Cloud Text‑to‑Speech.
  • ElevenLabs: If you want programmatic chapter management, ElevenLabs Studio supports chapters via its API (list chapters, snapshot streaming). Use the Studio API to upload chapter text and request per‑chapter audio.
  • Google Cloud TTS: Use the long‑form synthesis endpoint to create large audio outputs. The long audio API writes output to a Cloud Storage URI and supports SSML controls for pauses and emphasis. Note the documented input size limit for long synthesis.
  • Practical tip: keep chapters <10–15 minutes for easier error recovery and better pacing. TTS APIs accept SSML, so add small pauses at section breaks and read footnotes sparingly.

4) Assemble and add chapter markers

  • Concatenate the per‑chapter audio files in order.
  • Create an ffmetadata (FFMPEG metadata) file with [CHAPTER] blocks. Each block needs a TIMEBASE, START and END timestamps (integers) and a title. Example block format:

[CHAPTER] TIMEBASE=1/1000 START=0 END=448000 title=Introduction

  • Remux the audio and metadata with ffmpeg to produce an M4B (or MP4) container that supports chapters:

ffmpeg -i concatenated.m4a -i ffmetadata.txt -map_metadata 1 -codec copy output.m4b

  • Many audiobook players and apps recognize M4B chapter marks. If you need MP3, some players read sidecar cue files, but M4B is the most reliable container for chapters.

5) Optional: embed Zotero links and notes

  • From your exported Markdown, add short notes or URLs as chapter descriptions (not all players show them, but many do). You can also keep a Markdown companion file with per‑chapter linkbacks to zotero:// openers.

Two short examples

  • Minimal researcher: Export notes from Zotero, run a script to split by headings, send each text chunk to Google Cloud TTS long‑audio (or ElevenLabs for more expressive voices), then run ffmpeg with an FFMETADATA file to embed chapters.
  • End‑to‑end studio: Use ElevenLabs Studio’s chapters API to manage chapter conversion server‑side, then download the final chaptered audio or programmatically stitch with ffmpeg metadata if you need a single M4B file.

Privacy and reliability notes

  • If data privacy matters, synthesize on a cloud account you control (Google Cloud) or self‑host TTS models when possible. ElevenLabs is convenient but check workspace and API privacy settings before uploading unpublished manuscripts.
  • Long documents may exceed single‑call limits. Split before synthesis. That keeps the pipeline resilient and lets you retry single chapters instead of regenerating the entire paper.

The takeaway

You no longer need a single "PDF to podcast" button. Combine Zotero’s Markdown export with chunked TTS synthesis and a short ffmpeg step that injects chapters. The result is usable, searchable, and fits normal audiobook players.

If you want my utility script (Zotero‑export → split → TTS → ffmetadata → M4B), I can share a tested starter repo and a short bash example that works with Google Cloud and ElevenLabs keys.