Ext · Cross-cutting

Grounding Failure

Detects when output contains claims not supported by source documents. Agents achieve less than 45% accuracy on document-grounded tasks (OfficeQA benchmark).

Examples

  • Agent extracts "$5.2M revenue" from a table, but the source shows $3.8M
  • Agent attributes a data point to Column A when it's actually from Column B
  • Agent fabricates a date not present anywhere in the source documents
  • Agent confuses Company X's metrics with Company Y's across documents

Detection methods

Numerical Verification
Cross-checks extracted numbers against source values (5% tolerance)
Entity Attribution
Verifies data points are attributed to correct entities
Ungrounded Claims
Identifies claims with no source evidence
Source Coverage
Checks that output claims map to actual source content

Calibration accuracy

F1
0.671
Precision
0.636
Recall
0.710

From the Pisama calibration set. See detector scoreboard for the full table.

Detect this in production with the framework adapters (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude Agent SDK, n8n, Dify). See the full taxonomy at /taxonomy.