Is Pisama a Braintrust scorer?

Name: Pisama
Author: Pisama

You can wrap Pisama detectors as Braintrust scorers. The detection happens locally; the result is a structured DiagnosisResult that maps cleanly to a Braintrust scorer output.

Use both. Here's where each one wins.

Observability tools see failures. Pisama acts on them. They are precondition layers; we are the action layer above. The comparison below is honest about where each one is stronger, not a zero-sum claim.

Pisama vs Braintrust

Braintrust is excellent at eval workflow: dataset versioning, scoring functions, regression dashboards, comparing model versions side-by-side. Pisama is failure detection: when an agent run goes wrong, name the failure and locate the step.

These are not competitors. Braintrust evaluates outputs against expected behavior; Pisama detects when execution itself misbehaves. Most teams shipping agentic systems need both.

Where Braintrust wins

Best-in-class eval workflow UX (dataset diffs, scorer authoring)
Tight CI integration: evals as part of the deploy gate
Fast playground for prompt iteration

Where Pisama wins

87 detectors, 6 externally validated at production grade, with published F1
In-flight detection: failures caught while the agent is still running
Single-agent, multi-agent, and sub-agent failures: coordination, loops, persona drift, silent cascades. Braintrust scorers do not target these

At a glance

Dimension	Braintrust	Pisama
Primary job	Output evaluation workflow	Process-level failure detection
When detection runs	Post-hoc against datasets	Synchronous, mid-execution
Author scorer how	Custom code or LLM judge per scorer	Pre-calibrated detector packs
Agent failure depth	Trace UI; scorers per agent	Single, multi, and sub-agent detectors (coordination, loops, silent cascade)

Externally validated at production grade: real-trace F1 0.80 or higher, precision 0.70 or higher, 30 or more external traces, external-grounded thresholds, and no per-difficulty blind spot (capability registry, external-only lane, 2026-06-14).

Recommendation

Braintrust for "did this prompt change regress the eval set?" Pisama for "did this run fail, and where?" Run both.

FAQ

Is Pisama a Braintrust scorer?: You can wrap Pisama detectors as Braintrust scorers. The detection happens locally; the result is a structured DiagnosisResult that maps cleanly to a Braintrust scorer output.