Deep Closeout: AD Priority Rebalance — UX/Features Over Tests

# Autonomous Dev Priority Rebalance: UX/Features Over Tests
**Date:** 2026-04-07
**Duration:** ~30 minutes
**Repos touched:** autonomousDev

## Context & Motivation

After reviewing the overnight autonomous dev journal (runs #12–#25), the user observed that the agent was heavily skewed toward shipping tests, a11y improvements, and lint fixes — "productive but not advancing the products." The user explicitly stated: **"I'm more interested in UX and feature advancement than shipping tests and accessibility improvements."**

Additionally, at 22% of the 7-day usage budget after only one night of runs, the burn rate was unsustainable for a full week. The user requested a proposal-only mode when usage crosses 50% of the 7-day window.

## Decisions Made

### 1. Feature Run Frequency: Every 5th → Every 2nd
- **Decision:** Feature runs now fire on every even-numbered run (50% of all runs)
- **Alternatives considered:** Every 3rd run (33%), making feature the default with maintenance as exception
- **Rationale:** 50/50 split balances forward progress with maintenance; going higher risks missing real bugs
- **Trade-offs:** Less time for test coverage and code quality sweeps

### 2. Priority Reordering: UX/Features Promoted, Tests/A11y Demoted
- **Decision:** UX improvements, new user-facing features, and design system alignment moved to Medium priority; tests and a11y demoted to Low
- **Alternatives considered:** Keeping tests at Medium and only boosting features on feature runs
- **Rationale:** Even on maintenance runs, a UX fix should outrank a test addition when both are available
- **Trade-offs:** Test coverage will grow more slowly; a11y improvements become opportunistic

### 3. Feature Run Valid Work List Rewritten
- **Decision:** Top-tier feature work is now explicitly UX-focused (flows, feedback, interactions, visual polish, performance). Tests and a11y are explicitly excluded from feature runs.
- **Rationale:** Previous list included tests and a11y as valid feature work, which let the agent gravitate to them on feature runs too
- **Trade-offs:** None significant — tests and a11y still happen on maintenance runs

### 4. Proposal Mode When 7d Usage > 50%
- **Decision:** When `USAGE_7D > 50%`, the agent enters proposal-only mode — scans repos, identifies the best improvement, writes a detailed proposal, but does NOT execute
- **Alternatives considered:** Harder throttling (skip more runs), lower hard-stop threshold
- **Rationale:** User's words: "it used like 20% overnight and there's a whole week to go." Proposals preserve the agent's judgment without burning budget, and the user can sign off on high-value proposals for execution
- **Trade-offs:** Proposals still cost tokens (scan + reasoning), but much less than full execution

## What Was Built / Changed

### prompt.md Changes
1. **Medium Priority section** — Added: UX improvements, new user-facing features, performance users can feel, design system alignment. Kept: code quality, dependency updates. Removed to Low: tests, a11y, documentation gaps.
2. **Low Priority section** — Added: tests, a11y, documentation gaps (from Medium). Kept everything else.
3. **Feature Runs section** — Frequency changed to "every 2nd run." Valid work rewritten with explicit top-tier (UX/visual/features/performance/design system) and acceptable tiers. Tests and a11y explicitly listed as "not valid on feature runs."
4. **New Proposal Mode section** — Explains scan-only behavior, `PROPOSAL:` output block format.
5. **Output Format section** — Added `PROPOSAL:` block template.

### run.sh Changes
1. **Feature run trigger:** `RUN_NUMBER % 5` → `RUN_NUMBER % 2`
2. **Proposal mode gate:** New block after feature run detection — checks `USAGE_7D > 50`, sets `PROPOSAL_MODE=true`
3. **Template injection:** `{{PROPOSAL_MODE}}` substituted into prompt
4. **Discord posting:** New block posts `PROPOSAL:` results to `#autonomous-dev-merges` with approval reaction prompt
5. **Outcome tracking:** `proposal_mode` field added to JSONL entries

### Commit
- `7b76824` — "Shift AD focus toward UX/features, add proposal mode for budget conservation"

## Validation

Two test runs executed:
- **Run #29 (maintenance):** Correctly picked high-priority security work (Vite vulns in claude-token-tracker, PR #10) over new feature/UX items — priority hierarchy working
- **Run #30 (feature run):** Picked **groceryGenius category grouping** — a real user-facing feature adding "By aisle" toggle with 11 grocery categories, icons, and section headers (~207 lines). Exactly the kind of work the rebalance was targeting.

## Open Items & Follow-ups

- **Monitor next 24h of runs** — Verify the 50/50 split produces a good mix in practice
- **Proposal mode untested** — 7d usage is only at 24%, so proposal mode hasn't triggered yet. Will activate organically as usage climbs.
- **Feature ideas files** — The per-repo `context/*-features.md` files were written under the old regime. May want to refresh them with more UX-oriented ideas.
- **Discord webhook** — `AUTONOMOUS_DEV_WEBHOOK` was unset during test runs (local execution). Verify webhook posting works on next cron-triggered run.

## Key Files

- `~/repos/autonomousDev/prompt.md` — Main agent prompt with priority hierarchy
- `~/repos/autonomousDev/run.sh` — Runner with feature frequency, proposal mode, usage gating
- `~/repos/autonomousDev/logs/outcomes.jsonl` — Run outcome tracking

Leave a Reply Cancel reply