Ext · Cross-cutting
Grounding Failure
Detects when output contains claims not supported by source documents. Agents achieve less than 45% accuracy on document-grounded tasks (OfficeQA benchmark).
Examples
- Agent extracts "$5.2M revenue" from a table, but the source shows $3.8M
- Agent attributes a data point to Column A when it's actually from Column B
- Agent fabricates a date not present anywhere in the source documents
- Agent confuses Company X's metrics with Company Y's across documents
Detection methods
- Numerical Verification
- Cross-checks extracted numbers against source values (5% tolerance)
- Entity Attribution
- Verifies data points are attributed to correct entities
- Ungrounded Claims
- Identifies claims with no source evidence
- Source Coverage
- Checks that output claims map to actual source content
Calibration accuracy
F1
0.671
Precision
0.636
Recall
0.710
From the Pisama calibration set. See detector scoreboard for the full table.
Detect this in production with the framework adapters (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude Agent SDK, n8n, Dify). See the full taxonomy at /taxonomy.