Ext · Cross-cuttingEnterprise
Retrieval Quality
Detects when agents retrieve wrong, irrelevant, or insufficient documents for a task. Retrieval is the primary bottleneck in RAG systems.
Examples
- Agent retrieves marketing materials when the question is about engineering specs
- Agent retrieves 10 documents but only 2 are relevant to the query
- Critical document about pricing is missing from the retrieved set
- Query about 2024 Q4 results returns documents from 2023
Detection methods
- Relevance Scoring
- Measures semantic alignment between query and retrieved docs
- Coverage Analysis
- Detects gaps in topic coverage across retrieved documents
- Precision Measurement
- Ratio of useful vs total retrieved documents
- Query Alignment
- Semantic match between query intent and retrieved content
Calibration accuracy
F1
0.824
Precision
0.718
Recall
0.968
From the Pisama calibration set. See detector scoreboard for the full table.
Detect this in production with the framework adapters (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude Agent SDK, n8n, Dify). See the full taxonomy at /taxonomy.