CrewAI failure detection without rebuilding your crew

CrewAI orchestrates role-based agents through sequential and hierarchical processes. Pisama detects the failures that role-based coordination introduces: persona drift (agents behaving outside their assigned role), task hand-off breaks (output of one agent never referenced by the next), coordination loops between agents, and shared-context corruption.

The CrewAI adapter hooks into `Crew.kickoff()` and `Task.execute()` callbacks; no manual instrumentation required. Both sequential and hierarchical processes are supported.

Heuristic detectors catch 90%+ of failures locally at zero cost. The LLM-judge tier escalates only when coordination intent is genuinely ambiguous.

Detectors specific to CrewAI

  • Persona drift
    F1 0.794: agent output diverges from declared role/backstory
  • Coordination failure
    F1 0.746: Agent B never references Agent A output
  • Communication breakdown
    F1 0.769: back-and-forth exchanges without progress
  • Information withholding
    F1 0.867: agent has answer in context, omits it
  • Loop detection
    F1 0.830: repeated state across delegation cycles

Install

pip install pisama pisama-auto
from pisama.auto import instrument_crewai
from crewai import Crew

instrument_crewai()  # patches Crew.kickoff + Task.execute

crew = Crew(agents=[...], tasks=[...])
result = crew.kickoff()  # detectors run on every step

Tested against CrewAI 0.30+.

FAQ

Does Pisama work with hierarchical CrewAI processes?
Yes, the adapter handles sequential, hierarchical, and consensual processes. For hierarchical, the manager-agent delegation chain is captured as a graph and coordination detectors run across the full hierarchy.
Can I detect when an agent steps outside its role?
The persona-drift detector compares agent output against the declared role, goal, and backstory using embedding similarity (T3) and LLM judge (T4). It flags cases where an analyst agent starts writing marketing copy or a researcher starts proposing implementations.
Does this work with CrewAI Flows?
Yes. Flow steps are captured as discrete spans; Pisama detectors apply at the step level. Conditional flows benefit especially from edge-misroute and coordination detectors.

See the full detector taxonomy at /taxonomy, benchmark numbers at /benchmarks, or compare against other observability stacks at /vs.