Deep Closeout: Auto Shorts Analytics, Learning Agent & Experimentation Framework

# Auto Shorts: Analytics, Learning Agent & Experimentation Framework
**Date:** 2026-04-09
**Duration:** ~4 hours
**Repos touched:** auto-shorts, shorts-pipeline

## Context & Motivation
The Auto Shorts system generates YouTube Shorts from long-form cooking videos for the Chef Agathe channels (EN + FR) but had zero visibility into performance. The user wanted to:
1. Measure how different titles, descriptions, and cutting approaches perform
2. Build a system that learns per-channel what resonates and applies those learnings automatically
3. Create an experimentation framework to test new approaches, measure results, and graduate winners

The goal is to maximize a mix of views, watch completions, and shares.

## Decisions Made

### Analytics via YouTube APIs (not scraping)
– **Decision:** Use YouTube Data API v3 for basic stats + YouTube Analytics API for watch completion
– **Alternatives:** YouTube Studio scraping, manual data entry
– **Rationale:** API is reliable, automated, and gives per-video granularity
– **Trade-offs:** Analytics API requires separate OAuth scope (yt-analytics.readonly) — needed GCP Console changes and channel re-auth

### Server-side token exchange (not worker-dependent)
– **Decision:** OAuth callback exchanges tokens directly via Google’s token endpoint in Node.js
– **Alternatives:** Original design delegated to Python worker for token exchange
– **Rationale:** Worker wasn’t running on VM; server-side exchange is self-contained
– **Trade-offs:** Token files stored on VM in data/tokens/ rather than in shorts-pipeline repo

### Learnings injected via instructions field (zero pipeline changes)
– **Decision:** Channel learnings and experiment instructions are prepended to the job’s `instructions` field at poll time
– **Alternatives:** Modify pipeline.py to read learnings directly, add a new field to the job schema
– **Rationale:** The worker’s `auto_select_preset()` and `analyze_transcript()` already read instructions — prepending context is invisible to the pipeline
– **Trade-offs:** Instructions can get long with many learnings; capped at top 10

### Experiments seed jobs immediately by default
– **Decision:** Creating an experiment pulls unprocessed videos from the library and creates jobs right away
– **Alternatives:** Wait for auto-poll to find new uploads (could take days)
– **Rationale:** User wanted experiments to start producing results quickly; existing library has 140+ EN and 66+ FR unprocessed videos
– **Trade-offs:** Uses existing library videos rather than only new uploads. “Wait for next auto-generate” is available as opt-out

## What Was Built / Changed

### Phase 1: YouTube Analytics Dashboard
– **New tables:** `clip_analytics_snapshots` (per-clip stats over time), `analytics_fetch_log`
– **Migrations:** `shorts_clips.channel_id`, `shorts_channels.analytics_scope`, `shorts_channels.last_analytics_fetch`
– **8 analytics query functions** in shorts-queue.js including preset comparison, channel summaries, snapshot history
– **Python fetcher** (`shorts-pipeline/fetch_analytics.py`): pulls stats from YouTube Data API, batches 50 videos per request, posts to server. Auto-syncs tokens from server for scope changes
– **3 dashboard views:** main overview (preset comparison bar chart, top performers), channel detail (sortable table), clip detail (growth chart with deltas)
– **Nav update:** Analytics link added to all 7 existing EJS views
– 12 new tests

### Phase 2: Deep Analytics (YouTube Analytics API)
– GCP Console: YouTube Analytics API enabled, yt-analytics.readonly scope added to OAuth consent screen (done manually via Cowork)
– OAuth auth-url route supports `?analytics=1` to request extended scope
– Settings UI shows “Enable Deep Analytics” / “Re-authenticate Analytics” button per channel
– `auth_channel.py –analytics` flag for CLI re-auth
– `fetch_analytics.py –mode full` pulls averageViewDuration, averageViewPercentage, shares
– Both Chef Agathe channels re-authed with analytics scope, initial data seeded (20 clips)

### Server-Side Token Exchange
– Replaced worker-dependent exchange with direct Node.js exchange via Google’s token endpoint
– Tokens saved to `data/tokens/` on VM
– New `GET /worker/token` route lets local fetcher pull tokens remotely
– `fetch_analytics.py` auto-syncs tokens from server when running in full mode

### Learning Agent
– **New file:** `lib/shorts-learning-agent.js` — periodic script that reviews per-channel analytics, sends data to Claude, extracts structured insights
– **New table:** `channel_learnings` (category, insight, confidence, sample_size, source, active)
– Claude prompt includes all clip data, preset comparison, and current learnings for refinement
– Insights categorized: duration, content_type, style, timing, general
– Confidence tied to sample size: low (<15), medium (15-30), high (>30)
– Can be triggered manually via POST /learnings/:channelId/run or run as standalone script
– Initial run produced 10 EN insights and 8 FR insights

### Experimentation Framework
– **New table:** `channel_experiments` (name, hypothesis, instructions, target_count, produced_count, status)
– **New table:** `experiment_suggestions` (persisted AI-generated suggestions)
– **Migration:** `shorts_clips.experiment_id` tags clips to experiments
– **Experiment lifecycle:** active → completed → graduated/rejected. Queued experiments auto-activate when current completes
– **Worker integration:** `/worker/poll` injects learnings + experiment instructions into job’s instructions field. `/worker/update` tags clips with experiment_id and increments produced_count
– **Auto-poll awareness:** New jobs check for active experiments and attach experimentId in metadata
– **Job seeding:** `seedExperimentJobs()` creates jobs from existing library videos immediately
– **Results comparison:** `getExperimentComparison()` compares experiment cohort vs baseline metrics
– **Graduation:** Converts experiment instructions into permanent channel learnings
– **AI suggestions:** POST `/experiments/:channelId/suggestions` generates 3 experiment ideas via Claude, persisted for later use. “Start This” / “Start All” buttons
– **UI:** Experiments page with active experiment progress bar, create form (Start Now vs Wait), queued experiments section, past experiments with status badges, suggestion cards with prior suggestions collapsible

### Weekly Learnings Summary
– **New file:** `lib/shorts-weekly-summary.js` — builds per-channel digest
– Sends email via SMTP (pezant.projects@gmail.com) to channel owner emails
– Posts to Discord #shorts-learnings channel (webhook created)
– Includes learnings, experiment status, and how-to guide
– Cron: every Monday at 9:03am on VM

### Operational Improvements
– **Stale job sweeper:** Runs every poll cycle, marks jobs stuck >2 hours as failed
– **Channel switcher:** Dropdown on all per-channel pages (analytics, learnings, experiments)
– **Auto-generate toggle:** Replaced checkbox with clear ON/OFF button + confirmation dialog
– **Static index.html fix:** Removed `public/index.html` that was overriding the EJS dashboard route
– **Channel lookup fix:** Routes use `getChannelById()` instead of filtering by logged-in email

## Architecture & Design

“`
┌─────────────────┐
│ YouTube APIs │
│ Data + Analytics│
└────────┬────────┘
│
┌──────────────┐ poll/update ┌────────┴────────┐ fetch_analytics.py
│ Python Worker │◄──────────────────►│ auto-shorts │◄───────────────────
│ (local WSL) │ learnings + │ server (VM) │ posts snapshots
│ pipeline.py │ experiment │ :3007 │
│ │ injected into ├─────────────────┤
│ │ instructions │ SQLite DB │
└──────────────┘ │ ├ shorts_jobs │
│ ├ shorts_clips │ ┌─────────────┐
┌──────────────┐ │ ├ clip_analytics│───►│ Dashboard │
│ Learning │ Claude analysis │ ├ channel_ │ │ /analytics │
│ Agent │◄──────────────────►│ │ learnings │ │ /learnings │
│ (6h cron) │ extract insights│ ├ channel_ │ │ /experiments│
└──────────────┘ │ │ experiments │ └─────────────┘
│ └ experiment_ │
┌──────────────┐ │ suggestions │ ┌─────────────┐
│ Weekly │ └─────────────────┘───►│ Email + │
│ Summary │ │ Discord │
│ (Mon 9am) │ └─────────────┘
└──────────────┘

Feedback loop:
Analytics fetched → Learning agent analyzes → Insights stored →
Worker poll injects learnings into prompt → Better clips produced →
Analytics measured → Cycle repeats
“`

## Learnings Captured

## Open Items & Follow-ups

1. **Worker rate limiting:** Currently processes back-to-back. Could add configurable delay between jobs to prevent YouTube 429 errors on large batches
2. **Learning agent cron:** Not yet set up as PM2 cron on VM — currently manual or via API trigger. Should be every 6 hours
3. **Queued experiments:** 6 experiments queued across both channels — will auto-activate as current ones complete
4. **Analytics fetch cron:** Should set up periodic `fetch_analytics.py –all –mode full` to keep stats current
5. **yt-dlp deprecation:** Worker logs warn about missing JS runtime — may need `deno` installed
6. **Experiment results review:** Once the 2 active experiments complete their 5 shorts each, results should be reviewed and graduated/rejected

## Key Files

### auto-shorts (Node.js server)
– `lib/shorts-queue.js` — All DB tables, migrations, query functions
– `lib/shorts-routes.js` — All routes (analytics, learnings, experiments, worker)
– `lib/shorts-auto-poll.js` — Auto-generation, experiment seeding, stale job sweeper
– `lib/shorts-learning-agent.js` — Periodic Claude-based analysis
– `lib/shorts-weekly-summary.js` — Weekly email + Discord digest
– `views/auto-shorts-analytics.ejs` — Main analytics dashboard
– `views/auto-shorts-analytics-channel.ejs` — Channel detail with sort/filter
– `views/auto-shorts-analytics-clip.ejs` — Clip detail with growth chart
– `views/auto-shorts-learnings.ejs` — Per-channel learnings management
– `views/auto-shorts-experiments.ejs` — Experiment creation, suggestions, results
– `ANALYTICS_SETUP.md` — Setup guide and cowork handoff instructions

### shorts-pipeline (Python worker tools)
– `fetch_analytics.py` — YouTube stats fetcher with token sync
– `auth_channel.py` — OAuth management with –analytics flag

Leave a Reply Cancel reply