The agent failure detection category is officially crowded. Here is where Pisama fits.

Name: Pisama
Author: Pisama

CAIS 2026 maps out the agent reliability landscape. 15 named players. Two major acquisitions. Five new funded entrants in six months. Here is an honest read on the state of the category and Pisama's position.

The ACM Conference on AI and Agentic Systems opens in San Jose on May 26. The program is the most accurate single snapshot available today of where the agent reliability category actually is. We read all 63 papers and 46 demos. Here is the honest summary, including where Pisama sits and where we do not.

The map as of May 2026

Six months ago, the elevator pitch "we detect when your AI agents fail" was novel. As of May 2026 it is the framing used by Patronus (Percival, 20+ failure modes, $40M raised), Operama (Cornell-affiliated, public at CAIS this month), AgentPex (a Microsoft Research and University of Washington paper that productized teams now cite as foundational), IBM ALTK (open-source middleware demoed at CAIS), and Pathfinder (a solo founder also on the CAIS program). That is five direct competitors using the same framing, plus the adjacent layers.

Above that, the platform layer consolidated fast. Cisco announced its intent to acquire Galileo on April 9, 2026; the deal closes in Q3 and Galileo becomes part of Splunk Observability Cloud. ClickHouse acquired Langfuse on January 16 as part of a $400M Series D at a $15B valuation. AGNTCY, the Cisco-donated Internet-of-Agents standard, sits at the Linux Foundation with 65 member companies including Cisco, Dell, Google Cloud, Oracle, Red Hat, Galileo, and LangChain.

Below that, five new funded competitors have stood up since November: Traversal ($48M Series A and seed, Sequoia and Kleiner Perkins, Cornell Tech, framed as AI-powered SRE), Temporal ($300M at $5B valuation in February, a16z-led, durable execution for reliable agents), InsightFinder ($15M Series B in April, pivoting from IT infrastructure monitoring into agents), Raindrop ($15M seed, custom-trained per-product anomaly detection), and Sentrial (YC W26, telemetry plus auto-remediation). Each comes with credibility and capital.

That is 15 named players in a category that did not exist by name in 2024. We are not early to this market. We never were.

What Pisama still has, that the 15-player matrix says no one else has:

What Pisama uniquely has

One. A public, per-detector calibration scoreboard: 84 detectors registered, 49 measured, 6 externally validated at production grade. Every competitor talks about 20-plus failure modes, 50-plus metrics, or sub-goal decomposition. None publish per-detector F1. The full taxonomy with F1 numbers is at pisama.ai/detectors; the calibration dataset is at github.com/Pisama-AI. This is the most defensible factual claim in the category.

Two. Convergence detector: metric-aware failure detection (slope, regression, divergence). Every surveyed competitor detects on text. No CAIS paper or demo we read covers the metric-aware angle. Multi-agent systems with shared metrics (cost, latency, accuracy across turns) need this; nobody else ships it.

Three. An implementation of the AgentPex pattern in production. AgentPex is the CAIS 2026 paper from Microsoft Research and University of Washington that introduced procedural compliance checking via system-prompt rule extraction. Pisama's specification_compliance detector implements this pattern at F1 0.966 on our calibration set. Patronus and others reference the failure mode; only Pisama runs the algorithm.

Four. Vibe-coder ICP focus. Patronus targets enterprise. Galileo and the Cisco motion target Splunk customers. Operama is framed for multi-agent operators with control-plane needs. Replit ships agents inside their own product (vertical, not horizontal). The vibe coder, the Cursor and Claude Code user shipping agents in JavaScript and TypeScript on Vercel, is genuinely uncontested by direct competitors. This is where Pisama is going to live.

What Pisama does not have that the matrix says we should: OpenInference and OpenTelemetry semantic-convention compatibility we can prove with a vendor-neutral conformance suite (Phoenix owns this standard, Galileo co-developed it). AGNTCY membership at the Linux Foundation (free to join, 65 companies in, including direct competitors). A published SDK integration matrix at the depth Patronus ships (smolagents, Pydantic AI, OpenAI Agents SDK, LangGraph, crewAI, each with a Colab notebook). A TypeScript SDK shipped (planned). A Claude Code and Cursor CLI integration (Phoenix already shipped this). We will close each of these.

What Pisama still needs to ship

The honest read on the rest of the CAIS program: the consensus-collapse paper from the Jozef Stefan Institute, the persona-coherence work from MIT, the persuasion-evaluation framework from UIUC, and the "Does Safety Molt" paper from Foundation-AI all map directly to specific Pisama detectors. We will be citing them. The 4.7% to 11% TRAIL accuracy reported across SOTA frontier models in the Patronus benchmark, against our 59.9% with 20 heuristic detectors, is the comparison we will continue to lead with.

If you are evaluating the agent reliability category right now, the short version is: Galileo and the Cisco motion if you are an enterprise Splunk customer; Langfuse if you want the OSS observability platform that ClickHouse now backs; Phoenix if you want the OpenInference-native observability layer that already ships a Claude Code CLI; Patronus if you want the managed AI-debugger product with TRAIL provenance; Operama if you want the Cornell research bet and are willing to wait for benchmarks; Pisama if you want the open detector pipeline with per-detector F1 you can audit before you deploy.

How to choose between us

These are not five-way substitutes. Most production teams will run a trace store (Langfuse or Phoenix) plus a detector layer (Pisama) plus a guardrail (Forge or Galileo Protect). The pages at pisama.ai/compare name each pair and where each wins.

We will be at CAIS May 26 through 29 in San Jose. If you are in the category, in the conference, or thinking about either, the door is open. tuomo@pisama.ai.

More on the failure taxonomy at /taxonomy. Detector benchmarks at /benchmarks/detectors. Framework adapters at /frameworks.