Mining Your Agent's Mistakes

TL;DR: 24 incidents, 6 patterns, most of them unsolvable with hard rules. The two biggest failures were already addressed by skills we’d deployed but never used. The gap was activation, not capability. Platform-native incident logging for Agent Amber is coming.

A One-Line Feedback Loop

It started with annoyance. During development sessions, our Claude Code agents would make mistakes (wrong port numbers, guessed fixes, commands missing flags) and we’d correct them in the moment, then forget about it. The same mistakes would reappear next session. No feedback loop.

The fix was a slash command called /incident that appends a single JSON line to a file.

{
  "timestamp": "2026-03-22T02:02:04Z",
  "description": "claude said beehave is on 8000 but its on 3000",
  "session_id": "4037e3c4-dc36-4948-ad89-ac20c080395f",
  "cwd": "/Users/twright/vanilla-agent"
}

No forms, no severity levels. When the agent does something wrong, type /incident, describe it, keep working. The session ID links back to the full conversation transcript for replay later. If flagging takes more than five seconds, you won’t do it at midnight mid-debug.

The Six Patterns

After four days: 24 incidents across nine sessions. We ran parallel analysis agents across the transcripts and looked at all 24 together. Every failure clustered into one of six patterns.

1. Confidently Wrong

Ten incidents, the most common by far. The agent states facts without checking. A service runs on port 8000 when it’s 3000. A feature “isn’t implemented” but the agent never searched the codebase. The agent has two ways to answer: check (using tools) or recall (from training data). Recalling is instant. Checking costs tokens. Under pressure, the agent defaults to recall, and recall is often wrong.

2. Wrong System Identity

Seven incidents. We run multiple instances of the same agent system in different directories. The agent confuses which is which, searches the wrong path, references decommissioned ports, or presents upstream git commits as local work. Three times in a row, despite correction.

3. Propose-and-Pray Debugging

Four incidents. The agent spots something that looks wrong, declares it the root cause, proposes a fix. Doesn’t work. Finds another thing, repeats. In one session, five sequential root causes were proposed for a 404 error before the agent actually traced the request path. The fix was a configuration flag that had been there the whole time.

4. Unauthorised Actions on Shared Environments

Three incidents. User asks “why can’t I see this skill in the staging UI?” The agent investigates, finds the skill isn’t in the database, and runs the seed script. The seed creates all 22 default skills. Nobody asked the agent to fix anything. In another case, the agent guessed at production deploy commands by trial-and-error on a live system because it didn’t check the CI/CD workflow first.

5. Bypassing Application Abstractions

Three incidents. The agent bypassed the skills architecture by dumping skill content directly into a project instructions file. In another case, it pre-created a project directory via SSH; when the user tried creating the project through the UI it failed, because the UI expects to create the directory itself.

6. Inverted Confidence

Two incidents. The agent correctly identified a project using Convex Cloud based on the .convex.cloud domain in environment variables. The user relayed a third party’s claim it was self-hosted. The agent immediately folded. The evidence was right there. The inverse also occurred: incorrect information repeated three times despite the user pushing back. Too confident when wrong, too quick to fold when right.

What Actually Prevents These?

The enforcement mechanisms available form a spectrum:

Prompt instructions  →  behavioural suggestion (weakest)
On-demand skills     →  task-specific rules, loaded when needed
Memory corrections   →  persistent across sessions
Stop hooks           →  gate at completion time
Middleware           →  gate at every response
Assert primitives    →  gate at code level (strongest)

Hard blocks only work for pattern-matchable commands. You can block git push --force with 100% reliability. You can’t write a regex that catches “the agent is about to state a port number without checking.” Five of the six patterns are about reasoning quality, not specific commands.

The more interesting finding: two skills in our platform directly address Patterns 1 and 3. Verification Before Completion (no claims without fresh tool evidence) and Systematic Debugging (no fixes without root cause investigation first). Both well-written, both already deployed, both sitting unused.

Skills are on-demand. The agent must choose to load them. The agent about to confidently state a wrong port number isn’t going to pause and think “I should load the verification skill.” The failure happens at the moment of highest misplaced confidence.

What works for each pattern

Pattern	Best Mechanism	Why
Confidently wrong	Stop hook (semantic) + verification skill	Needs understanding, not regex
Wrong system identity	System prompt (reference tables)	Authoritative ground truth
Propose-and-pray debugging	Debugging skill (trace protocol)	Task-specific activation
Unauthorised shared env actions	Hard-block hooks + runtime guards	Pattern-matchable, high stakes
Bypassing abstractions	System prompt + memory corrections	Architectural judgement
Inverted confidence	System prompt rules	Pure behavioural

Where This Is Going

The analysis pointed to a tiered approach. Verification and debugging skills are enabled now as a baseline. Next is a prompt-based Stop hook that reviews the transcript for ungrounded factual claims before completing. After that, CRITIC-pattern verification middleware gating at every response, not just completion.

We’re also building incident logging directly into Agent Amber, so users can flag frustrations mid-conversation without leaving the chat. Same principle as the CLI slash command, but platform-native and tied to a real user and session record. Coming soon.

The question isn’t “what rules should we write?” We’d already written them. The question is how to make verification mandatory instead of optional. The answer lives on a spectrum from cheap-and-leaky to expensive-and-airtight, with practical middle ground in semantic Stop hooks. The agents will keep making mistakes. The question is whether they make the same ones.