Oleksii Siniaiev
Post page navigation
Articles 11 min read June 21, 2026

How to tell your AI agent lost context: a state-canary in AGENTS.md and CLAUDE.md

A state-canary is a simple observability pattern for AI coding agents: one state line per response helps you catch context drift, lost scope, and forgotten constraints early.

Прев'ю статті про state-canary для AI-агентів: термінал із маркером стану task, scope і constraint, що допомагає помітити втрату контексту.
On this page

TL;DR

  • Make the agent print one state line at the start of every response: [state: task=…; scope=…; constraint=…]. While the line stays correct, you have a signal that the key instructions and task state are still being held.
  • When a field turns to unknown, scope widens on its own, or task gets swapped — that is a visible sign of context drift, which you catch before the agent writes wrong code.
  • A state-canary is an observability signal, not anti-hallucination magic. It catches lost instructions and task drift, but not confident factual lies. When it trips — reset the session and recover from markdown.

The most expensive mistake an AI agent makes is not the one it stumbles on and asks for help with. The expensive one is the mistake it makes confidently: it references a function that does not exist, “remembers” a constraint you never gave, rewrites something you agreed five turns ago not to touch. The tone of the answer does not change. Only the quality does — and you find out at review, after you have already spent an hour.

I have caught myself in the same loop several times: in a long session the agent starts inventing paths to files it read twenty turns earlier. The root cause is almost always the same — lost working context, not a “dumb model.” The problem is that the loss is invisible: there is no moment when the screen turns red and says “context degraded.”

This article is about a cheap observability trick that makes that loss visible: the state-canary — a status line the agent prints at the start of every response, directly per an instruction in AGENTS.md or CLAUDE.md. By the end you will have a copy-paste block you can drop into a project today, plus a protocol for when the canary trips.

If you have not built the agent’s base memory yet, start with AI agents in a developer’s workflow — it covers the AGENTS.md/CLAUDE.md pattern everything here builds on.

Who this helps — and when not to bother

The technique is not universal, and it is fairer to draw the boundary up front.

Especially useful if you

run long, multi-step tasks with AI coding agents; keep task state in markdown; work with Claude Code, Codex, or Cursor; and regularly catch confident mistakes after long sessions.

And the opposite — do not spend time on it when:

  • you ask a short one-off question or solve a 5–10 minute task;
  • the agent does not touch code at all (explanation, review, search);
  • checking facts matters more than tracking context — there the safeguard is tests and sources, not a canary.

The problem: the agent is confident but no longer remembers the agreements

What is dangerous is not the mistake itself but its invisibility. The agent keeps an even tone whether it is working from current context or from its scraps. So you need an external indicator — something cheap and always in view that breaks before the result does.

The good news: the agent already picks up project instructions from AGENTS.md and similar files. You can use that same machinery as a sensor — ask it to constantly show what it is currently holding in its working state.

Why working context gets lost

Modern Claude and Codex models can work well when the needed context is physically in the window. But some of the confident mistakes in long sessions come not from a “dumb model” but from the working context being pushed out, blurred, or used less effectively. There are several mechanisms.

Window overflow. Every model has a limit on the tokens it holds at once. A session that runs hundreds of turns pushes out the early messages: opening instructions, agreed constraints, and decisions slide off the edge of the window.

Compaction and summarization. Claude Code compresses long history — via /compact or automatically when the window fills. Precision matters here, because the generalization “/compact loses CLAUDE.md” is wrong. The project-root CLAUDE.md is re-injected after compaction; but nested CLAUDE.md files in subdirectories do not come back on their own until the agent reads the file in that directory again. And most importantly — instructions that lived only in the conversation, plus details of the current task, can drop out of the effective working state even when the root memory has been re-read. That is why I keep the active task not in the chat but in a separate markdown file, and re-read it after a reset or compact.

Context rot. Even without hard truncation, accuracy falls as context grows — Anthropic’s context-window documentation explicitly describes recall degrading as volume increases. The decline is gradual: there is no point where “everything broke,” which is exactly why you cannot spot it by eye.

Lost in the middle. An instruction can physically be in the window yet sit in its middle. Models reproduce what is at the start and end of context more reliably than what is squeezed in the middle — the effect described in the “Lost in the Middle” study. An important rule surrounded by a large volume of text is followed worse.

Level 1. The simplest canary

Coal mines kept canaries: the bird reacted to dangerous air before people did. Same idea here — a cheap indicator that breaks first.

Put an instruction in AGENTS.md that is easy to verify at a glance, does not affect the task result, and is specific enough not to match by accident:

Markdown
## Output format

Start every response with: "Alex,"
No exceptions, even for one-line answers.

While the prefix is there, the top instructions in the file are still being followed. Here is what a failure looks like in practice:

Prompt
Turn 4:  Alex, updated the routes, here is the diff...
Turn 19: Done. Fixed the config, regenerated the cache...
         ^ the prefix is gone — time to check what else slipped

Important: keep this instruction short. Codex builds an instruction chain from several AGENTS.md files, and the project guidance has a size limit. The canary should be a single line, not a separate policy.

That is already enough to put you on alert. But a name prefix only records the fact of a failure, not its content.

Why the name is only the first level

A name gives a binary answer: the rule is followed or not. It will not tell you what exactly slipped — the task, the allowed boundary, or the project map. To make the canary diagnostic, you have to put state into it.

Level 2. State-canary: task / scope / constraint

Have the agent print one line reflecting the active context:

Prompt
Before answering, print exactly one line:

[state: task=<task-slug>; scope=<allowed-area>; constraint=<main-constraint>]

If any field is unclear, write "unknown".
Do not invent missing state.

The three fields are not arbitrary: task answers “what are we on,” scope — “where code may be touched,” constraint — “what must not be done.” Here is how it reads on a real task:

Code
[state: task=paywall-refactor; scope=WordPress/Timber templates only; constraint=no vendor patching]

Found a place where the paywall logic is still tied to a vendor override...

If on the next turn constraint becomes unknown or scope grows on its own, you will see it before the agent reaches into vendor code. The rule Do not invent missing state is key here: better it honestly writes unknown than paints a plausible field out of inertia.

A normal turn:

Code
[state: task=paywall-refactor; scope=WordPress/Timber templates only; constraint=no vendor patching]

Checking only the Timber templates, not touching vendor.

A suspicious turn:

Code
[state: task=paywall-refactor; scope=unknown; constraint=unknown]

I can edit the vendor file directly...

That is the moment to stop: the agent still speaks confidently, but the marker already shows that the working boundaries of the task are lost.

Why not a turn counter

It is tempting to add a turn counter to the marker — it looks tidy and “observable.” I do not use it: the model can keep counting mechanically, reset it, or fudge it, all while the context has already degraded. task, scope, and constraint are harder to fake out of inertia — so their diagnostic value beats a pretty number.

How to read the signal

The value of a state-canary is that different ways the marker breaks mean different things and call for different actions.

SymptomWhat it may meanWhat to do
The whole marker is goneThe agent stopped following the top instructionDo not continue blindly, check the state
task changed without a commandThe agent mixed up tasksRestore the task-slug or start a new session
constraint=unknownThe agent is unsure of the current constraintRe-read the task md and scope
Marker is correct, but the facts are wrongThis is a factual hallucination, not context driftVerify via tools, tests, sources
The agent started inventing pathsLost the project mapForce a find/ls/read, do not trust memory

What the canary catches, and what it does not

A state-canary is an engineering sensor, not a proof. The boundary is simple and worth keeping in mind:

  • Catches context drift, instruction drift, and lost task state — cases where instructions have slid out of the effective context.
  • Does not catch factual hallucination: if the agent confidently lies with a correct marker, the source is the model’s training prior, not lost context.
  • Does not replace tests, tools, source verification, and code review. It is an early signal on top of them, not instead of them.
Failure classDoes the canary help?
Window overflow, old instructions pushed outYes
Drift after compaction/summarizationYes
Context rot (gradual recall degradation)Partly
Lost in the middle (instruction present but ignored)Partly
Factual hallucination with a correct markerNo
A requirement misread from the very startNo

When you need a hook, not a canary

If a rule must not just appear in the response but be guaranteed to run at a specific moment — for example, running tests before a commit or forbidding writes to certain files — use hooks or automated checks. A canary shows that the agent remembers a constraint; a hook forces the environment to perform the action regardless of what the agent “remembers.”

Level 3. The recovery protocol

The marker is gone, changed on its own, or showed unknown. The main thing I do not do is try to fix a long session with yet another long prompt: that only adds tokens without removing the degradation. Instead — reset and recover from markdown:

Prompt
If the state marker disappears, changes unexpectedly, or shows "unknown":
1. Stop the current task.
2. Summarize current state into docs/analysis/<task>.md.
3. Start a fresh session.
4. Re-read AGENTS.md/CLAUDE.md and the task markdown.
5. Continue from the last verified step.

That is why my active tasks live in docs/analysis/<task-slug>.md: the file serves as both the original analysis and a restore point. After /clear, the agent re-reads it and picks the thread back up in one turn — without replaying the whole conversation.

A ready block for AGENTS.md / CLAUDE.md

Copy into the root AGENTS.md or CLAUDE.md and substitute your own values:

Markdown
## Agent state canary

At the start of every response, print exactly one line:

[state: task=<current-task>; scope=<allowed-area>; constraint=<main-constraint>]

Rules:
- Keep it to one line.
- Use "unknown" if unsure.
- Do not invent missing state.
- If the user changes task, update the task field.

If the marker disappears, changes unexpectedly, or shows "unknown":
1. Stop the current task.
2. Summarize state into docs/analysis/<task>.md.
3. Start a fresh session.
4. Re-read this file and the task markdown.
5. Continue from the last verified step.

FAQ

Will the marker break the agent’s answers?

No, if it is one line. The agent prints it first, then answers as usual. You either do not notice the line or you immediately see an anomaly.

Does this work outside Claude — with Codex or Cursor?

Yes, if the tool regularly picks up project instructions from AGENTS.md, CLAUDE.md, or an equivalent. It is not a system-level guarantee: Claude Code itself notes that CLAUDE.md is not a system prompt but an instruction the model tries to follow. That is exactly why a canary is needed as an observable signal, not as a hard safeguard.

What if the agent is “lazy” and skips the marker while context is fine?

It happens. So a single miss is a reason to be alert, not to reset the session. The signal is a marker gone for two or three turns in a row, or turned unknown.

Do you need a canary in short sessions?

No. Over 20–30 turns in a small project, degradation is unlikely. The value grows on long, multi-step tasks.

How do you tell lost context from an ordinary mistake?

By the pattern: the agent stopped following several rules at once, re-asks settled questions, confuses file paths. Three or four signals together are no longer a coincidence.

Sources and further reading

Share this article

LinkedIn X Email

Get in touch

Working on something similar? Let's talk.

I am always open to discussing architecture, Laravel, WordPress, performance, and practical implementation problems.

Send a message See selected work

Explore more

Articles

June 14, 2026

Part 3. A month with an AI journal: finding patterns in sleep, stress, and workouts

How to analyze an AI journal after the first month: transcription fixes, honest reflection…
Articles

June 14, 2026

Part 2. Hermes Agent + DeepSeek on Ubuntu: a complete Telegram AI journal setup

A step-by-step setup for Hermes Agent and DeepSeek on Ubuntu: a locked-down Telegram bot,…
Articles

June 14, 2026

Part 1. How I turned an old gaming laptop into an AI wellbeing journal

How an old Xiaomi Mi Gaming Laptop became a home AI server: Hermes Agent,…