Does this work with AutoGen 0.4 (the rewrite)?

Yes. The adapter targets the `autogen-agentchat` and `autogen-core` packages. The legacy `pyautogen` package is also supported via the `pisama-auto` shim.

How do you detect a two-agent loop without an LLM?

Hash the (sender, receiver, content-fingerprint) tuple per turn. If the same tuple recurs within a configurable window, it is a loop. Subsequence matching catches longer cycles (A, B, C, A, B, C). No LLM call needed.

AutoGen group chat failure detection

Name: Pisama
Author: Pisama

AutoGen group chats are the highest-failure-rate orchestration pattern in production. Two-agent loops, premature termination, and role bleed account for 60%+ of failed runs in the TRAIL benchmark. Pisama detects all three structurally; no LLM judge needed.

The AutoGen adapter instruments `GroupChatManager.run()` and `Conversable.initiate_chat()`. State after each turn is hashed for loop detection; speaker selection is tracked for termination and coordination detectors.

Detectors specific to AutoGen

Loop detection
F1 0.830: same state recurring across turns
Coordination failure
F1 0.746: speaker never addresses prior speaker
Persona drift
F1 0.794: agent ignores assigned role/expertise
Communication breakdown
F1 0.769: back-and-forth without progress
Information withholding
F1 0.867: agent withholds known answer

Install

pip install pisama pisama-auto

from pisama.auto import instrument_autogen
from autogen import GroupChat, GroupChatManager

instrument_autogen()
manager = GroupChatManager(groupchat=GroupChat([...]))
manager.run(message="...")  # detectors run on every turn

FAQ

Does this work with AutoGen 0.4 (the rewrite)?: Yes. The adapter targets the `autogen-agentchat` and `autogen-core` packages. The legacy `pyautogen` package is also supported via the `pisama-auto` shim.
How do you detect a two-agent loop without an LLM?: Hash the (sender, receiver, content-fingerprint) tuple per turn. If the same tuple recurs within a configurable window, it is a loop. Subsequence matching catches longer cycles (A, B, C, A, B, C). No LLM call needed.

See the full detector taxonomy at /taxonomy, benchmark numbers at /benchmarks, or compare against other observability stacks at /vs.