# Run the text pipeline This guide walks through the current text pipeline in this repo: chunk a transcript, generate stub summaries, then generate candidate assets. The commands only support stub/dry-run modes today (no external LLM calls). ## 1. Prepare inputs - A transcript plain-text file (UTF-8). - Optional: a chapters text file with one chapter per line. ## 2. Run chunking + summaries If you want to run the full text pipeline (summaries + candidates) in one step, use: ```bash podcast draft \ --dry-run \ --workspace ./workspaces/ep_001 \ --transcript /path/to/transcript.txt \ --chapters /path/to/chapters.txt \ --episode-id ep_001 ``` Notes: - `--workspace` must not exist; the command creates it. ```bash podcast summarize \ --dry-run \ --workspace ./workspaces/ep_001 \ --transcript /path/to/transcript.txt \ --episode-id ep_001 ``` Notes: - `--workspace` must not exist; it will be created. - The transcript is copied into `transcript/transcript.txt` under the workspace. ## 3. (Optional) Add chapters Chapters are used when generating assets. Supply them using any of these sources (first match wins): - Put `transcript/chapters.txt` inside the workspace. - Set `inputs.chapters` in `episode.yaml`. - Pass `--chapters /path/to/chapters.txt` to the next step. ## 4. Generate candidate assets ```bash podcast draft-candidates --workspace ./workspaces/ep_001 --candidates 3 ``` ```bash podcast draft-candidates \ --workspace ./workspaces/ep_001 \ --chapters /path/to/chapters.txt ``` ## 5. Inspect outputs - Chunk text + metadata: `transcript/chunks/chunk_0001.txt` and `.json`. - Chunk summaries: `summaries/chunks/chunk_0001.summary.json`. - Episode summary: `summaries/episode/episode_summary.{json,md,html}`. - Candidate assets: `copy/candidates//candidate_.{json,md,html}`.