Blog

Notes on agent failure detection

Name: Pisama
Author: Pisama

2026-06-08multi-agent · architecture · agents · ai-engineering
Do You Actually Need a Multi-Agent System?
Multi-agent AI fails on 41–87% of tasks and costs roughly 15x more tokens than a single agent. A structured framework for deciding when the complexity is worth it.
2026-06-08multi-agent · failure-modes · agents · reliability
A Field Guide to Multi-Agent Failure Modes
The MAST taxonomy from Cemri et al. (NeurIPS 2025) classifies 14 failure modes from 1,642 annotated traces. A guide to what breaks, where it enters a trace, and what interventions have measured effect sizes.
2026-05-20competitive-landscape · cais-2026 · positioning
The agent failure detection category is officially crowded. Here is where Pisama fits.
CAIS 2026 maps out the agent reliability landscape. 15 named players. Two major acquisitions. Five new funded entrants in six months. Here is an honest read on the state of the category and Pisama's position.
2026-04-02multi-agent · observability · failure-detection
Why your multi-agent system fails silently, and how to detect it
Most multi-agent failures are silent: no exception, no log line, just a wrong answer or a stuck run. Here is the structural taxonomy and how to catch them.
2026-04-02evaluation · benchmarks · agents
Heuristic detectors vs LLM judges: what we learned analyzing 7,000 agent traces
7,212 labelled agent traces. We tested heuristic pattern matchers against frontier-LLM judges on the same failures. The heuristics won by 5x at zero cost.
2026-04-02agents · taxonomy · production
The 17 ways AI agents break in production
A taxonomy of agent failure modes derived from 7,212 labelled traces across LangGraph, CrewAI, AutoGen, n8n, and Dify.

Notes on agent failure detection

Do You Actually Need a Multi-Agent System?

A Field Guide to Multi-Agent Failure Modes

The agent failure detection category is officially crowded. Here is where Pisama fits.

Why your multi-agent system fails silently, and how to detect it

Heuristic detectors vs LLM judges: what we learned analyzing 7,000 agent traces

The 17 ways AI agents break in production