Deep Closeout: Discord Quality Gap Fix

# Deep Closeout: Discord Quality Gap Fix — URL Pre-Fetching, Retry Detection, Page-Reader Fallback

**Date:** 2026-04-06
**Duration:** ~45 minutes
**Repos touched:** centralDiscord, buying-assistant, agentGuidance

## Context & Motivation

The user reported that a buying guide request dispatched through Discord failed to read a shared Gemini link (`gemini.google.com/share/1692639e3f4d`). The agent used WebFetch, which returned empty content (JS-rendered SPA), then proceeded without the context. The user’s original requirements (hard wax oil for a pine nightstand) were in that Gemini link, so the agent produced a generic USB-C hub guide from web search alone.

A retry attempt also failed: the user pasted the prior bot output back with a correction (“use the page-reader util”), but the agent saw the pasted output and said “I don’t see a specific task.”

The user also asked: “Why do my requests made through Discord seem to be dumber and less successful than requests made directly here?”

## Root Cause Analysis

1. **WebFetch silently fails on JS SPAs.** Gemini, React apps, modern forums return empty/broken HTML without JavaScript execution. The buying-assistant CLAUDE.md had no instructions about fallback tooling.

2. **No URL pre-fetching in the Discord pipeline.** Unlike the interactive CLI where the user can redirect on tool failure, Discord’s single-shot execution means the agent has to discover, attempt, fail, and recover from link-reading failures on its own. Most don’t.

3. **No retry detection.** When users paste prior bot output back with corrections, the bot passes the entire mess (bot output + user correction) as the query. The agent sees a completed report and gets confused.

4. **Structural quality gap (Discord vs CLI):**
– `planMode: ‘skip’` and `clarifyAmbiguous: ‘best-effort’` means no interactive recovery
– No feedback loop when tools fail mid-execution
– Prompt wrapping dilutes original user intent
– Route classifier sent URL-only messages to debate instead of direct execution

## Decisions Made

### Decision: Pre-fetch URLs server-side in contextFetcher
– **Alternatives considered:** (a) Only add page-reader instructions to agent directives, (b) Build a dedicated URL proxy service, (c) Pre-fetch only for specific channels
– **Rationale:** Server-side pre-fetch solves the problem for ALL agents without relying on each agent knowing how to use page-reader. The content is injected into the prompt before the agent even starts. Falls back gracefully if page-reader isn’t available.
– **Trade-offs:** Adds 3-20 seconds latency per URL. Max 3 URLs pre-fetched. 6000 char budget per URL.

### Decision: Belt-and-suspenders approach (pre-fetch + directive + CLAUDE.md)
– **Alternatives considered:** Only pre-fetching (single layer)
– **Rationale:** Pre-fetching handles URLs in the initial request, but agents discover new URLs during execution (e.g., following links in pre-fetched content). The directive and CLAUDE.md instructions cover those cases.
– **Trade-offs:** Slight prompt bloat from the directive instruction (~1 sentence).

### Decision: Bot output stripping at handler level, not buildRequestContext
– **Alternatives considered:** Strip in buildRequestContext (affects all channels)
– **Rationale:** Retry-with-paste is specific to channel watchers (buying-guides, job-search) where the same channel holds both the original request and bot responses. The #requests channel has session chaining for follow-ups. Stripping at handler level is more targeted.

### Decision: URL-dominated messages classify as TASK
– **Alternatives considered:** Keep fail-open to debate
– **Rationale:** Users dropping links expect action. A message that’s >40% URL with no debate signals (no “should we”, “vs”, “tradeoffs”) is almost certainly a task. Debate adds 30-60s of unnecessary overhead.

## What Was Built / Changed

### centralDiscord (commit c86983c)

**`src/bot/contextFetcher.js`** (184 lines added):
– `stripBotOutput(text)` — Detects “Actions Taken”, “Session: UUID”, “Summary, Recommendation:” patterns and extracts user intent from retry messages
– `extractExternalUrls(text)` — Finds URLs in text, filters out Discord/media/GitHub domains
– `prefetchUrl(url, readerPath)` — Runs page-reader CLI on a single URL with 20s timeout, stealth mode, 3s wait
– `prefetchExternalUrls(text)` — Orchestrates parallel pre-fetch of up to 3 URLs
– `getPageReaderPath()` — Resolves page-reader location (VM vs local WSL)
– Pre-fetch integrated into `buildRequestContext()` — runs after attachments, before reply context. Injects `— PRE-FETCHED URL CONTENT —` blocks into the prompt.

**`src/bot/index.js`**:
– Buying-guide handler now calls `stripBotOutput()` on the raw query
– If retry detected, adds `IMPORTANT: The user is retrying…` note to the prompt
– Imported `stripBotOutput` from contextFetcher

**`src/bot/executor.js`**:
– `EXECUTE_DIRECTIVE` now includes page-reader fallback instruction

**`src/bot/jobRequest.js`**:
– `buildDirective()` now appends page-reader fallback instruction to all structured directives

**`src/bot/routeClassifier.js`**:
– Added URL-dominated message detection (>40% URL chars, no debate signals = TASK)

### buying-assistant (commit 31f1f92)

**`CLAUDE.md`**:
– Added “Handling Shared Links” section between Phase 1 and Phase 2
– Instructions: try WebFetch first, fall back to page-reader, never skip shared links
– Explicit command syntax for page-reader with stealth mode

### agentGuidance (commit 902cf41)

**`agent.md`**:
– Added page-reader fallback as a Core Principle (line 29)
– All agents now pick up the rule, not just buying-assistant

### buying-assistant (commit 178493f)

**`guides/hard-wax-oil/recommendation.md`**:
– Full buying guide produced from the Gemini link content that originally failed
– Corrected Gemini’s oversized Rubio recommendation ($63 for 390ml -> $32 for 130ml)
– Found General Finishes HWO ($26) as best value, which Gemini missed entirely
– Verdict: GF HWO 8oz for best overall, Rubio 130ml for purist, Fiddes 250ml for budget

## Learnings Captured

| Learning | Where Saved |
|—|—|
| Page-reader fallback for JS SPAs | agentGuidance/agent.md (Core Principles), buying-assistant/CLAUDE.md, centralDiscord executor.js + jobRequest.js directives |
| URL pre-fetching for Discord pipeline | centralDiscord/src/bot/contextFetcher.js (implementation) |
| Bot output stripping for retries | centralDiscord/src/bot/contextFetcher.js + index.js |
| URL-dominated messages are tasks | centralDiscord/src/bot/routeClassifier.js |

## Open Items & Follow-ups

1. **Monitor page-reader latency in production.** Pre-fetching adds 3-20s per URL. If this becomes a bottleneck, consider caching or a dedicated pre-fetch worker.
2. **VM needs page-reader deployed.** The bot runs on the VM; `getPageReaderPath()` checks both `~/page-reader/` and `~/repos/page-reader/`. Verify the VM path exists and has Playwright dependencies installed.
3. **Other channel watchers.** Job-search handler could also benefit from bot-output stripping if users retry there.
4. **Test the full pipeline end-to-end.** Drop a Gemini link in #buying-guides after the restart and verify pre-fetching works.
5. **SKIP_DOMAINS list may need tuning.** Currently skips Discord, YouTube, Twitter, GitHub. May want to add or remove domains based on real usage patterns.

## Key Files

– `centralDiscord/src/bot/contextFetcher.js` — URL pre-fetch + bot output stripping (the core change)
– `centralDiscord/src/bot/executor.js:189` — EXECUTE_DIRECTIVE with page-reader fallback
– `centralDiscord/src/bot/routeClassifier.js` — URL-dominated task classification
– `centralDiscord/src/bot/index.js:430-443` — Buying-guide retry handling
– `buying-assistant/CLAUDE.md` — Shared link handling instructions
– `agentGuidance/agent.md:29` — Page-reader core principle
– `buying-assistant/guides/hard-wax-oil/recommendation.md` — The guide that should have been produced originally

Leave a Reply

Your email address will not be published. Required fields are marked *