Run the text pipeline

This guide walks through the current text pipeline in this repo: chunk a transcript, generate stub summaries, then generate candidate assets. The commands only support stub/dry-run modes today (no external LLM calls).

1. Prepare inputs

  • A transcript plain-text file (UTF-8).

  • Optional: a chapters text file with one chapter per line.

2. Run chunking + summaries

If you want to run the full text pipeline (summaries + candidates) in one step, use:

podcast draft \
  --dry-run \
  --workspace ./workspaces/ep_001 \
  --transcript /path/to/transcript.txt \
  --chapters /path/to/chapters.txt \
  --episode-id ep_001

Notes:

  • --workspace must not exist; the command creates it.

podcast summarize \
  --dry-run \
  --workspace ./workspaces/ep_001 \
  --transcript /path/to/transcript.txt \
  --episode-id ep_001

Notes:

  • --workspace must not exist; it will be created.

  • The transcript is copied into transcript/transcript.txt under the workspace.

3. (Optional) Add chapters

Chapters are used when generating assets. Supply them using any of these sources (first match wins):

  • Put transcript/chapters.txt inside the workspace.

  • Set inputs.chapters in episode.yaml.

  • Pass --chapters /path/to/chapters.txt to the next step.

4. Generate candidate assets

podcast draft-candidates --workspace ./workspaces/ep_001 --candidates 3
podcast draft-candidates \
  --workspace ./workspaces/ep_001 \
  --chapters /path/to/chapters.txt

5. Inspect outputs

  • Chunk text + metadata: transcript/chunks/chunk_0001.txt and .json.

  • Chunk summaries: summaries/chunks/chunk_0001.summary.json.

  • Episode summary: summaries/episode/episode_summary.{json,md,html}.

  • Candidate assets: copy/candidates/<asset_id>/candidate_<uuid>.{json,md,html}.