LangGraph failure detection in three lines

Pisama ships a dedicated LangGraph adapter that detects six structural failures the LangSmith UI does not surface: unbounded recursion, checkpoint corruption, parallel-branch desync, tool-failure cascades, edge misroute, and state corruption inside `MessagesState`.

The adapter wraps the compiled graph and emits OpenTelemetry spans with `gen_ai.*` semantic conventions. Detectors run locally on every step transition; LLM-judge tier escalates only on ambiguous coordination calls. Median overhead at T1–T3 is under 5 ms per step.

No vendor lock-in: the same trace can be ingested by LangSmith, Phoenix, Langfuse, or any OTel-compatible backend in parallel.

Detectors specific to LangGraph

  • Recursion depth
    F1 0.976: unbounded recursion in conditional edges
  • Tool-failure cascade
    F1 0.900: repeated tool errors propagating across nodes
  • Parallel sync
    F1 0.874: diverged state across `Send` parallel branches
  • Checkpoint corruption
    F1 0.871: state schema drift across checkpoints
  • State corruption
    F1 0.809: type/shape changes in `MessagesState`
  • Edge misroute
    F1 0.835: conditional edge routes to unintended next node

Install

pip install pisama pisama-langgraph
from pisama.langgraph import instrument
from langgraph.graph import StateGraph

graph = StateGraph(...).compile()
instrument(graph)  # auto-emit OTel spans + run detectors

Tested against LangGraph 0.2+ and langchain-core 0.3+.

FAQ

Does Pisama replace LangSmith?
No, they answer different questions. LangSmith is artifact-level: it stores traces and runs LLM-judge graders on outputs. Pisama is process-level: it detects loops, recursion, and state corruption while the graph is executing. Most teams run both in parallel; the LangGraph adapter emits standard OTel spans that either backend can ingest.
Does instrumentation slow down my graph?
Tier-1 hash and tier-2 delta detectors run in under 5 ms per step transition. Tier-3 embedding detectors run on a sampled subset by default. Tier-4 LLM judges run async out-of-band and never block execution.
Which LangGraph features are supported?
StateGraph, MessageGraph, Send-based parallelism, conditional edges, checkpointing (in-memory + Postgres), and human-in-the-loop interrupts. The adapter is tested against LangGraph 0.2+.
How do I detect recursion in a graph that loops by design?
The recursion detector uses subsequence matching on the visited-node sequence, not raw step count. A loop of (planner, researcher, planner, researcher) is caught even when each step is unique. The default threshold is 3 cycle repeats; tune via `recursion_max_cycles`.

See the full detector taxonomy at /taxonomy, benchmark numbers at /benchmarks, or compare against other observability stacks at /vs.