Context
Built a custom scraper for congressional stock trade disclosures covering both House and Senate for the trading agent insider-following strategy. Previous data sources — FMP, Stock Watcher S3, Finnhub free tier — were all dead or paywalled.
What Was Built
Capitol Trades RSC Parser — Primary, Both Chambers
- Parses React Server Components stream from capitoltrades.com
- Single HTTP request fetches 96 trades with full metadata
- Extracts: ticker, member name, party, chamber, amount range, date, sector
- No API key or headless browser needed
House Clerk PDF Scraper — Fallback
- Downloads annual FD ZIP for filing index from disclosures-clerk.house.gov
- Fetches individual PTR PDFs, extracts transactions via pdfplumber
- House only, but provides granular transaction detail
Three-Tier Architecture
Capitol Trades RSC, then House Clerk PDFs, then Finnhub API as last resort. Each tier degrades gracefully.
Key Decisions
- Capitol Trades RSC over direct government sites: Senate EFD returns 503 — site maintenance. House Clerk works but is House-only and slow. Capitol Trades covers both chambers in one fast request.
- Balanced brace JSON extraction: RSC stream embeds trade objects as serialized JSON. Balanced brace matching isolates each object reliably.
- Value-to-range mapping: Capitol Trades provides midpoint values e.g. 8000 which map back to STOCK Act disclosure ranges like 1001-15000.
Data Quality
- 1040+ total records — House: 1031, Senate: 9
- 72+ unique tickers
- Notable: Jim Banks R-Senate selling SBUX, Boozman buying NVDA, Biggs purchasing 100-250K IBIT
- Prompt shows House/Senate labels with party affiliations and committee relevance
Commits
da8df04 on master in trading-agent
Open Items
- Monitor Senate EFD for when it comes back online
- Capitol Trades RSC format could change if they update their Next.js rendering
- FINRA short interest endpoint discovery still pending — token works, paths changed