Do the Simplest Thing: Building Agents for Models That Keep Getting Better
Anthropic put it bluntly in their Effective context engineering for AI agents post: “As model capabilities improve, agentic design will trend towards letting intelligent models act intelligently, with progressively less human curation.” In other words, the slope of model quality matters more than the scaffolding we bolt on today. This post is a quick guide to building AI agents that stay simple, adaptable, and ready to absorb better models with minimal rewrites.
Why “do the simplest thing that works” holds up
- Model slope beats orchestration: A lean agent that delegates judgment to the model benefits immediately when the frontier model improves. Heavy orchestration locks you to yesterday’s assumptions.
- Less glue, fewer breakpoints: Every heuristic, DSL, or rigid state machine is a future maintenance cost. Minimize bespoke logic so you can swap models, tools, or policies without refactoring the spine.
- Bandwidth for outcomes, not ops: Teams ship faster when they prune non-essential scaffolding and focus on data quality, evals, and delivery.
Symptoms you’re over-engineering
- A forest of tool-selection heuristics that silently rot as models get smarter.
- Multi-step prompt routers that mainly compensate for a weak base model.
- “Safety” rules duplicated across prompts, middleware, and tools instead of centralized policies and logging.
- Latency spikes from unnecessary coordination layers (planners, sub-agents, routers) that deliver marginal quality gains.
A minimal agent blueprint (v0)
- Single planner loop: Let the model reason about goals, pick tools, and reflect—no bespoke routers unless data proves the need.
- Thin tool contracts: Keep tool interfaces small and typed; return structured errors the model can handle.
- Context discipline: Prioritize fresh, high-signal context (recent actions, user intent, tool summaries) over giant static retrieval dumps.
- Structured outputs: Ask for JSON schemas where it matters; avoid ornate formats that will break with slight phrasing shifts.
What to still engineer deliberately
- Guardrails at boundaries: Permissioning, rate limits, spending caps, and write-scopes around tools and external systems.
- Observability first: Trace runs, log prompts/tool calls, and capture deltas in success metrics when you swap models.
- Offline and shadow evals: Use held-out tasks plus live shadow runs to see if a new model makes existing glue obsolete.
- Degenerate cases: Keep fast fallbacks for latency-sensitive flows (cheap model + cached answers) even as the main loop simplifies.
How to iterate as models improve
- Remove routers and handoffs that no longer move the metric once a stronger model lands.
- Collapse multi-agent choreographies into a single loop if quality holds; reserve specialization for genuinely distinct domains.
- Simplify prompts: replace long instruction blocks with concise policies plus a few high-quality exemplars.
- Track why scaffolding exists: note the bug it fixed and delete it once the bug vanishes on newer models.
A quick upgrade playbook
- Baseline with the simplest single-loop agent.
- Measure: task success, latency, tool error recovery.
- Swap in the stronger model; rerun the same evals.
- Prune any router, planner, or guardrail that no longer earns its keep.
- Reinvest saved complexity into data quality, monitoring, and tool reliability.
The takeaway: assume your models will get better. Build the minimum scaffolding that keeps users safe and the system observable, and let smarter models do more of the work as soon as they arrive.