Fetch few-shot datasets¶
Use these commands to fetch recent episodes and write JSONL files for few-shot selection. For the output schema, see
reference/few-shot-dataset-jsonl.md.
Before you start¶
Ensure the
podcastCLI is available (runpodcast --help).RSS feeds must expose RSS 2.0
<item>entries with<description>or<content:encoded>.Wagtail/CMS endpoints must return a list under
items,results, orpages.
RSS feed examples¶
podcast rss-examples \
--feed-url https://example.com/feed.xml \
--output datasets/rss_examples.jsonl \
--limit 25
Notes:
--outputdefaults todatasets/rss_examples.jsonlif you omit it.--limitcaps how many episodes are written (default25).--timeout-secondssets the HTTP timeout.The JSONL
outputfield contains normalized episode description HTML.
Wagtail/CMS API examples¶
podcast cms-examples \
--api-url https://example.com/api/v2/pages/?type=podcast.EpisodePage \
--output datasets/cms_examples.jsonl \
--limit 25
Notes:
The command skips items that do not have a title plus description or shownotes HTML.
Field names support dot-notation for nested payloads (for example,
meta.html_url).The CLI always sends a
limitquery param; include any other filters in the--api-url.
If your Wagtail API returns metadata in meta, supply the field mapping explicitly:
podcast cms-examples \
--api-url https://example.com/api/v2/pages/?type=podcast.EpisodePage \
--link-field meta.html_url \
--slug-field meta.slug \
--published-field meta.first_published_at \
--page-id-field meta.id
The CMS command supports these field options:
--title-field--summary-field--description-field--shownotes-field--tags-field--link-field--slug-field--published-field--page-id-field
Check the output¶
Each line in the JSONL file is a single JSON object. See
reference/few-shot-dataset-jsonl.md for the full field list.