Ext · Cross-cuttingEnterprise

Retrieval Quality

Detects when agents retrieve wrong, irrelevant, or insufficient documents for a task. Retrieval is the primary bottleneck in RAG systems.

Examples

  • Agent retrieves marketing materials when the question is about engineering specs
  • Agent retrieves 10 documents but only 2 are relevant to the query
  • Critical document about pricing is missing from the retrieved set
  • Query about 2024 Q4 results returns documents from 2023

Detection methods

Relevance Scoring
Measures semantic alignment between query and retrieved docs
Coverage Analysis
Detects gaps in topic coverage across retrieved documents
Precision Measurement
Ratio of useful vs total retrieved documents
Query Alignment
Semantic match between query intent and retrieved content

Calibration accuracy

F1
0.824
Precision
0.718
Recall
0.968

From the Pisama calibration set. See detector scoreboard for the full table.

Detect this in production with the framework adapters (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude Agent SDK, n8n, Dify). See the full taxonomy at /taxonomy.