Run the text pipeline¶
This guide walks through the current text pipeline in this repo: chunk a transcript, generate stub summaries, then generate candidate assets. The commands only support stub/dry-run modes today (no external LLM calls).
1. Prepare inputs¶
A transcript plain-text file (UTF-8).
Optional: a chapters text file with one chapter per line.
2. Run chunking + summaries¶
If you want to run the full text pipeline (summaries + candidates) in one step, use:
podcast draft \
--dry-run \
--workspace ./workspaces/ep_001 \
--transcript /path/to/transcript.txt \
--chapters /path/to/chapters.txt \
--episode-id ep_001
Notes:
--workspacemust not exist; the command creates it.
podcast summarize \
--dry-run \
--workspace ./workspaces/ep_001 \
--transcript /path/to/transcript.txt \
--episode-id ep_001
Notes:
--workspacemust not exist; it will be created.The transcript is copied into
transcript/transcript.txtunder the workspace.
3. (Optional) Add chapters¶
Chapters are used when generating assets. Supply them using any of these sources (first match wins):
Put
transcript/chapters.txtinside the workspace.Set
inputs.chaptersinepisode.yaml.Pass
--chapters /path/to/chapters.txtto the next step.
4. Generate candidate assets¶
podcast draft-candidates --workspace ./workspaces/ep_001 --candidates 3
podcast draft-candidates \
--workspace ./workspaces/ep_001 \
--chapters /path/to/chapters.txt
5. Inspect outputs¶
Chunk text + metadata:
transcript/chunks/chunk_0001.txtand.json.Chunk summaries:
summaries/chunks/chunk_0001.summary.json.Episode summary:
summaries/episode/episode_summary.{json,md,html}.Candidate assets:
copy/candidates/<asset_id>/candidate_<uuid>.{json,md,html}.