{Joel Horowitz}
Agentic AI · Context · 2026

Bigger Isn't Smarter: What the Context Window Race Gets Wrong

Why most reasoning needs less context than you think — and what that means for how we build

There is a question that sounds like an engineering optimization problem but turns out to be something more fundamental. When you are building agentic AI systems — systems where multiple models collaborate, delegate, and hand off work to each other — you are forced to answer it precisely: what context does an agent need, exactly, to solve this problem?

Get it wrong in one direction and the agent drowns in irrelevant information, loses focus, produces noise. Get it wrong in the other and it operates on incomplete information and fails confidently — which is worse than failing visibly.

The interesting thing is that this question didn't originate in AI engineering. It's the same question a good manager asks every time they delegate. And the fact that agentic systems are forcing us to answer it rigorously is making visible something that human organizations have been doing sloppily for a long time.

  • How agentic systems evolve from single loops to trees — and why lazy context is the key architectural move
  • Why "what context does this agent need?" is just another name for precise problem decomposition
  • How effective delegation and agentic orchestration fail in exactly the same ways — for exactly the same reasons
  • Why optimal context transfer includes some redundancy — the confidence and calibration functions of overlap
  • Why intelligence — human or artificial — scales through context curation, not accumulation
I. The Arc of Agentic Complexity

From Loops to Forests

The natural progression when building agentic systems mirrors the progression of any new engineering paradigm. You start simple.

A single agent in a loop: give it a task, let it iterate, collect the result. It works. But you notice things. The context window grows with every turn. The model starts losing focus on the original objective. Early information — which seemed important at the time — starts polluting later reasoning. And the cost, in tokens and latency, compounds.

The industry's response has been to expand the container rather than rethink what goes in it. A new arms race has taken hold, centered on the context window — the maximum amount of text a model can consider at once. In 2018, maximum context windows were 512 tokens. Today, Llama 4 Scout offers 10 million. But research shows effective context is only 50–65% of what's advertised, and performance degrades as you fill more of the window. The arms race is solving the wrong problem.

The first instinct is to add more agents. Multi-agent panels — asking several models the same question and collating the results — introduce something genuinely valuable: disagreement as signal. When models converge, you have higher confidence. When they diverge, you have a flag. The arbitration problem (how do you collate?) is itself interesting, but it's a second-order problem.

The deeper insight comes later, when you start thinking about the shape of a reasoning process. Most implementations treat an agentic conversation as a thread — one growing context that everything gets appended to. But a reasoning process isn't a thread. It's a tree. Branches form. Some branches need shared state. Many don't.

The MCP tool lookup is the canonical example. At some point in solving a problem, you need to know what tools are available for a particular subtask. There is no reason to drag the entire conversation history into that lookup. It's a clean subproblem: question in, recommendation out. Spin off a subagent with surgical context, get the answer, fold it back in. The branch never needed to know what came before it.

The same “tool lookup in isolation” pattern is part of what production systems wrap in harnesses, evals, and hand-soldered reliability layers — the difference between a demo thread and infrastructure.

This is the architectural move that changes how you think about agentic systems: lazy context. Don't pass everything. Pass what the branch needs to produce its output. Everything else is noise.

II. The Epistemological Question

What Is a Problem About?

"What context does this agent need?" turns out to be equivalent to a deeper question: what is this problem about? What are its actual dependencies?

This is not a new question. It's what compilers do with variable scope — enforcing that a function can only see what it needs to see. It's what mathematicians do when they state a lemma — they list exactly the assumptions required, no more. It's what interface design in programming makes explicit: an interface tells you everything you need to know about an object, and deliberately hides everything you don't. It's what a good lawyer does when isolating the legal question from the facts. Minimum sufficient context is just another name for precise problem decomposition.

What's interesting is that humans are genuinely bad at this in practice. We over-share context constantly — partly for social reasons, partly because we don't know in advance which details will matter, and partly because the cost of over-sharing in conversation is low. A few extra sentences. A bit of the listener's attention.

In agentic systems the cost is explicit and real. Tokens. Latency. Attention dilution — the model equivalent of a team meeting where half the information is irrelevant to half the attendees. Agentic engineering is forcing a rigorous version of something humans do sloppily by default, and in doing so it's making the sloppiness visible for the first time.

The key insight that emerges from building these systems: most reasoning is more local than it appears. We carry more context than we need because pruning is hard and the cost of under-sharing feels higher than the cost of over-sharing. But the locality is real. Most subproblems, examined carefully, depend on far less than we instinctively include.

III. Delegation as Context Engineering

What Good Managers Have Always Done

The conventional framing of delegation is about trust, or workload distribution. A manager delegates because they can't do everything, and because they trust someone to handle a piece. That framing isn't wrong, but it's incomplete. It misses what the hard part actually is.

The hard part of delegation is epistemic. It's figuring out exactly what information transfer needs to happen for the other person to operate autonomously and correctly.

A bad delegator fails in one of two symmetric ways. They over-share — dumping everything they know, overwhelming the person, burying signal in noise. Or they under-share — assuming the other person has context they don't, leaving them making decisions on incomplete or wrong assumptions. Both failure modes are common. Both produce bad outcomes. They're just bad in different ways.

A good delegator does something precise. They model what the other person already knows — their base context, derived from their role, their experience, their existing familiarity with the situation. They subtract that from the total context needed. And they transfer exactly the delta.

This maps almost perfectly onto the agentic orchestration problem. The orchestrator models what the subagent already "knows" by virtue of its system prompt and role. It adds the minimum task-specific context. It passes only that. The failure modes are identical: an overwhelmed subagent producing unfocused output, or an under-informed one failing confidently in the wrong direction.

The parallel explains something that might otherwise seem like a coincidence: bad managers and bad agentic systems fail in exactly the same ways. The structure of the failure is the same because the underlying problem is the same. Effective delegation — human or artificial — is context curation.

IV. The Overlap Principle

Why Optimal Isn't Minimal

There is a temptation, once you've understood context curation as the core of delegation, to optimize toward the minimum. Transfer exactly the delta. Nothing more.

But this breaks down in the human case in a way that turns out to be instructive.

When a manager delegates to a person, they often include some overlap — context the person already has. This looks like redundancy. From a pure information-theoretic standpoint, it is. But it serves two functions that are anything but redundant.

The first is psychological. Hearing that you already know some of what's needed to handle a task is confidence-building. It signals: you have handled things like this before, you know how to approach this. That's not noise — it's scaffolding. Motivated agents perform differently than pure information processors, and confidence affects performance.

The second is epistemic. When the manager mentions something the person already knows, and it matches their existing model, it provides calibration. A small signal that says: my understanding of this situation is accurate, I can trust my other assumptions. The redundancy lets the person verify that their base context is correctly loaded before relying on it.

Optimal context transfer isn't minimal context transfer. It's minimal new context, plus just enough overlap to anchor confidence and verify calibration. The overlap is load-bearing — just in a different register than the new information.

Whether this principle has a direct analog in agentic systems is an open question. There may be a version of it in prompt engineering — restating things a model "should already know" to activate the right frame, increase reliability, reduce the chance of the model operating on a miscalibrated assumption. The redundancy isn't purely wasted.

V. Intelligence Scales Through Curation

The Instinct and the Discipline

The instinct when facing a hard problem is to gather more information. More context feels like more capability — more raw material to reason from, more surface area to find the answer on. This instinct is wrong often enough to be worth examining.

More context is only valuable if the additional information is relevant and the reasoning process can isolate what matters. In practice, beyond a certain threshold, additional context degrades rather than improves output — in humans and in models. The relevant signal gets harder to find. Earlier information anchors reasoning in ways that may not serve the current question. The cognitive or computational cost of processing irrelevant information is real.

The discipline — the thing that's actually hard — is knowing when you have enough, and packaging it precisely for whoever or whatever is doing the next step.

Intelligence — human or artificial — scales through context curation, not context accumulation. The manager who delegates well isn't the one who shares the most. It's the one who shares the right things. The agentic system that performs well at scale isn't the one with the largest context window. It's the one that knows what to put in it.

This is what building agentic systems teaches you, if you build enough of them. Not that AI is powerful, which everyone already knows. But that the structure of effective reasoning — the way problems decompose, the way context should flow, the way intelligence distributes across agents — is the same whether the agents are models or people.

The engineering problem turns out to be a window into something older. How to think well. How to delegate well. How to know what you need to know — and nothing more.

Related threads: who gains when recall is automated; when the cost of glue between systems collapses.

That seems worth understanding.

Yet for all the sophistication of modern agentic systems, context curation remains largely unsolved as a discipline. Most implementations still fall back to two crude modes:

  • Brute-force the window. Pass the full thread, flood context, and rely on attention to sort signal from noise.
  • Hardcode at build time. An engineer decides once what each agent sees. Better than nothing, but brittle — it doesn't adapt as tasks evolve, and it bakes in fixed assumptions about what matters.
  • The open problem. How to curate context well: decompose precisely enough that each branch gets what it actually needs, as the work changes.

That may be one of the important engineering problems of the next few years — not because the missing piece is exotic tooling, but because it demands something closer to epistemic discipline: knowing not just how to reason, but what to reason about.

Agentic workflows, multi-agent panels, and what context a subagent actually needs — April 2026