# Pisama: Agent Forensics > Process-level failure detection for AI agent systems. The layer between observability and incident response. Pisama tells you what went wrong during the run, by name: loops, state corruption, persona drift, coordination breakdown, withholding, injection. Observability shows you what happened. Rubric graders (Bedrock, Foundry, Vertex, Anthropic Managed Agents) score the artifact. Pisama is the missing layer in between. ## Positioning - **Process-level, not artifact-level.** LLM-judge graders judge the output; they cannot see two agents looping on each other for 14 turns or shared state corrupted at step 7. - **Heuristics first.** A 5-tier pipeline: T1 hash (~0 ms / $0), T2 delta (~1 ms / $0), T3 embeddings (~10 ms / $0), T4 LLM judge (~200 ms / ~$0.02), T5 human review (async). 90%+ of detections resolve in T1–T3 at $0. - **Open source SDK + CLI; hosted Heal in private beta.** Detect and Diagnose ship in the open-source SDK. Heal (fix suggestions, checkpoint rollback, detection memory, chaos engineering with blast-radius controls) is in development on pisama.ai. ## Numbers - **39 calibrated detectors (85 total, 24 production-grade)** across 6 categories: Planning & Decomposition (6), Execution & State (7), Coordination (6), Verification & Quality (7), Behavior & Safety (7), Reasoning & Observability (5). Plus 21 framework-specific detectors across LangGraph, OpenClaw, n8n, Dify, and Managed Agents - **15 framework-specific detectors**: +5 LangGraph, +5 OpenClaw, +3 n8n, +2 Dify - **TRAIL detection benchmark**: Pisama 59.9% joint accuracy vs 11.9% best frontier (GPT-5.4), a 48-point lead. Source: TRAIL benchmark, 148 traces, 841 labelled failures. - **Who&When attribution benchmark (ICML 2025)**: Pisama + Sonnet 4 ties best LLM at 60.3% agent accuracy and leads on step accuracy at 24.1%. Pisama heuristic-only: 31% / 16.8%, the LLM-judge tier matters for attribution. ## Architecture - **Backend**: FastAPI, SQLAlchemy, PostgreSQL with pgvector, Alembic migrations. Deployed on Fly.io (api.pisama.ai). - **Frontend**: Next.js (App Router), React, TailwindCSS, NextAuth (Google OAuth). Deployed on Vercel (pisama.ai). - **SDK**: Python packages: `pisama-core` (orchestrator/scoring), `pisama-detectors` (calibrated detectors), `pisama-auto` (auto-instrumentation), `pisama-agent-sdk` (Claude Agent SDK hooks), `pisama-claude-code` (Claude Code traces). All MIT-licensed. - **CLI**: Click-based CLI with MCP server (`pisama-mcp` for Cursor / Claude Desktop / Windsurf). - **Database**: Fly Postgres 16 + pgvector. Redis on Fly for rate limiting + caching. ## Supported Frameworks (12 dedicated adapters + generic OTel) - LangGraph (SDK adapter) - Claude Agent SDK (real-time hooks) - OpenAI Assistants / Responses API (trace ingestion via `pisama_core.adapters.parse_openai_*`) - AWS Bedrock Agents (trace ingestion via `pisama_core.adapters.parse_bedrock_invoke_agent`) - Google ADK - LangChain Deep Agents - n8n (webhook) - Dify (webhook) - OpenClaw (SDK adapter) - Claude Managed Agents (webhook + API pull + /grade shim) - Claude Code (OTEL + MCP) - Cursor / Claude Desktop / Windsurf (via MCP) - Generic OpenTelemetry ingestion, any framework emitting `gen_ai.*` semantic conventions (CrewAI, AutoGen, Semantic Kernel, others) --- ## API Reference Base URL: https://api.pisama.ai/api/v1 Authentication: Bearer token or API key (X-Pisama-API-Key header) Interactive Docs: https://api.pisama.ai/docs ### Authentication #### POST /auth/tenants Create a new tenant account. Returns tenant ID and API key. - Rate limited: 5 requests per hour per IP. - Request: `{ "name": "My Organization" }` - Response: `{ "id": "", "name": "My Organization", "api_key": "pisama_...", "created_at": "..." }` #### POST /auth/token Exchange API key for a JWT bearer token. - Request: `{ "api_key": "pisama_..." }` - Response: `{ "access_token": "eyJ...", "token_type": "bearer" }` #### POST /auth/api-keys Create a named API key for programmatic access. - Requires: Bearer token - Request: `{ "name": "production-key" }` - Response: `{ "id": "", "name": "production-key", "key": "pisama_...", "key_prefix": "pisama_abc", "created_at": "..." }` #### GET /auth/api-keys List all API keys for the current tenant. #### DELETE /auth/api-keys/{key_id} Revoke an API key. #### GET /auth/me Get current user information. ### Trace Ingestion #### POST /traces/ingest Ingest agent traces in OpenTelemetry format. Backpressure-aware with automatic load shedding. - Status: 202 Accepted (async processing) - Request body: `{ "resourceSpans": [...] }` (OTEL format) - Automatic detection runs after ingestion. #### GET /traces List traces with pagination and filtering. - Query params: `page`, `per_page`, `framework`, `status`, `date_from`, `date_to` - Response: `{ "traces": [...], "total": 100, "page": 1, "per_page": 20 }` #### GET /traces/{trace_id} Get trace details including state timeline and detection results. #### GET /traces/{trace_id}/states Get all state snapshots for a trace, ordered by sequence number. ### Framework-Specific Ingestion #### POST /n8n/webhook Receive n8n workflow execution data. Auto-parses n8n node execution format. - Request: `{ "executionId": "...", "workflowId": "...", "workflowName": "...", "mode": "manual", "startedAt": "...", "status": "success", "data": {...} }` - Response: `{ "success": true, "trace_id": "...", "states_created": 5, "quality_assessment_triggered": false }` #### POST /n8n/workflows Register an n8n workflow for monitoring. #### GET /n8n/workflows List registered n8n workflows. #### POST /langgraph/webhook Receive LangGraph run data with step-by-step state tracking. - Request: `{ "run_id": "...", "assistant_id": "...", "thread_id": "...", "graph_id": "...", "started_at": "...", "status": "completed", "steps": [...] }` - Response: `{ "success": true, "trace_id": "...", "states_created": 8 }` #### POST /langgraph/deployments Register a LangGraph deployment. #### GET /langgraph/deployments List registered LangGraph deployments. #### POST /dify/webhook Receive Dify workflow execution data. - Request: `{ "workflow_run_id": "...", "app_id": "...", "app_type": "workflow", "started_at": "...", "status": "succeeded", "nodes": [...] }` - Response: `{ "success": true, "trace_id": "...", "states_created": 6 }` #### POST /dify/instances Register a Dify instance. #### GET /dify/instances List registered Dify instances. #### POST /openclaw/webhook Receive OpenClaw session data including multi-agent events. - Request: `{ "session_id": "...", "instance_id": "...", "agent_name": "...", "channel": "whatsapp", "started_at": "...", "events": [...] }` - Response: `{ "success": true, "trace_id": "...", "states_created": 12 }` #### POST /openclaw/instances Register an OpenClaw instance. #### GET /openclaw/instances List registered OpenClaw instances. #### POST /traces/claude-code/ingest Ingest Claude Code traces with tool use, reasoning, and cost data. - Request: `{ "source": "claude-code", "version": "0.1.0", "uploaded_at": "...", "trace_count": 10, "traces": [...] }` - Response: `{ "success": true, "traces_received": 10, "traces_stored": 10, "session_ids": [...] }` ### Conversation Traces #### POST /conversations/ingest Ingest multi-turn conversation traces. Supports MAST-Data, OpenAI messages, Claude conversation, and generic turn-based formats. Format is auto-detected. #### GET /conversations List conversation traces with pagination. #### GET /conversations/{conversation_id} Get conversation detail with turns and analysis. #### POST /conversations/{conversation_id}/analyze Run turn-aware detection analysis on a conversation. ### Detections #### GET /detections List detected failures with filtering and pagination. - Query params: `page`, `per_page`, `detection_type`, `validated`, `confidence_min`, `confidence_max`, `trace_id`, `date_from`, `date_to` - Response includes: `explanation` (human-readable), `business_impact`, `suggested_action`, `confidence_tier` (HIGH/LIKELY/POSSIBLE/LOW) #### GET /detections/{detection_id} Get detection details with explanation and fix suggestions. #### PUT /detections/{detection_id}/validate Mark a detection as validated or false positive. - Request: `{ "false_positive": true, "notes": "This was expected behavior" }` #### GET /detections/{detection_id}/fixes Get AI-generated fix suggestions for a detection. #### POST /detections/{detection_id}/fixes/{fix_id}/apply Apply a fix suggestion (triggers healing workflow). ### Healing (Self-Repair) #### POST /healing/trigger/{detection_id} Trigger self-healing for a detected failure. Supports approval policies. - Request: `{ "fix_id": "optional-specific-fix", "approval_required": false }` - Response: `{ "healing_id": "...", "detection_id": "...", "status": "pending" }` - Status transitions: pending -> in_progress -> applied/staged/failed, staged -> applied/rolled_back/rejected, applied -> rolled_back #### GET /healing List healing records with status tracking. #### GET /healing/{healing_id} Get healing record details. #### POST /healing/{healing_id}/approve Approve a staged healing action. #### POST /healing/{healing_id}/rollback Roll back an applied healing action. ### Agents #### GET /agents List agents derived from trace state data. Returns token usage, latency, step counts, and activity status. ### Analytics #### GET /analytics/loops Get loop detection analytics over time. - Query params: `days` (1-365, default 30) - Response: time series, loops by method, top affected agents #### GET /analytics/costs Get cost analytics (token usage, dollar costs). #### GET /analytics/quality Get workflow quality analytics with daily scores and issue counts. ### Feedback #### POST /feedback Submit feedback on detection accuracy for threshold tuning. - Request: `{ "detection_id": "", "is_correct": true, "reason": "...", "severity_rating": 3 }` #### GET /feedback List submitted feedback. #### GET /feedback/stats Get aggregated feedback statistics (precision, recall, F1 by framework/type/method). #### GET /feedback/recommendations Get threshold adjustment recommendations based on feedback data. ### Benchmarks #### GET /benchmarks Get complete benchmark results for all MAST failure modes with methodology transparency. #### GET /benchmarks/summary Get benchmark summary only (lighter endpoint). #### GET /benchmarks/modes Get failure modes, optionally filtered by tier or category. - Query params: `tier` (1/2/3), `category` (content/structural/rag) #### GET /benchmarks/methodology Get benchmark methodology information (dataset size, sources, approaches). ### Diagnostics #### GET /diagnostics/detector-status Get detector health and readiness. Returns production/beta/experimental/failing status for each detector with F1 scores. ### Metrics #### GET /metrics Prometheus-format metrics export (text/plain). #### GET /metrics/json JSON metrics export (traces, detections, tokens, cost, detector F1/threshold/ECE). #### POST /metrics/datadog/flush Flush metrics to Datadog. #### GET /metrics/datadog/dashboard Get Datadog dashboard configuration. ### Settings #### GET /settings/thresholds Get current detection thresholds (global and per-framework). #### PUT /settings/thresholds Update detection thresholds. - Request: `{ "global_thresholds": { "structural_threshold": 0.85, "semantic_threshold": 0.80 }, "framework_thresholds": { "langgraph": { "loop_detection_window": 8 } } }` ### Workflow Groups #### POST /workflow-groups Create a workflow group with optional auto-detect rules. #### GET /workflow-groups List workflow groups. #### PUT /workflow-groups/{group_id} Update a workflow group. #### DELETE /workflow-groups/{group_id} Delete a workflow group. #### POST /workflow-groups/{group_id}/assign Assign workflows to a group. ### Onboarding #### GET /onboarding/status Check onboarding progress (has traces, has detections). #### POST /onboarding/demo Load demo data for onboarding. ### Health #### GET /health Health check endpoint. Returns database, Redis, and overall status. - Response: `{ "status": "healthy", "database": "healthy", "redis": "healthy", "version": "0.1.0" }` --- ## Detectors by Category ### Core Detectors (ICP Tier - Always Available) 1. **loop** - Loop Detection: Detects infinite loops, repetitive patterns, and cycling behavior in agent execution. Uses exact hash matching, structural comparison, and semantic similarity. Tiered: hash (T1) -> state delta (T2) -> embeddings (T3) -> LLM judge (T4). 2. **persona_drift** - Persona Drift: Detects when an agent deviates from its assigned persona, role, or behavioral constraints. Identifies role confusion and persona blending in multi-agent systems. 3. **hallucination** - Hallucination Detection: Identifies factual inaccuracies, fabricated information, and unsupported claims in agent outputs. Compares outputs against source documents and known facts. 4. **injection** - Injection Detection: Detects prompt injection attempts, jailbreak patterns, and adversarial inputs targeting agent systems. 5. **overflow** - Context Overflow: Detects context window exhaustion, token budget violations, and memory pressure issues in agent conversations. 6. **corruption** - State Corruption: Identifies invalid state transitions, data corruption between agent steps, and state inconsistencies. Compares current state against previous state snapshots. 7. **coordination** - Coordination Analysis: Detects coordination failures between agents including handoff errors, message loss, race conditions, and deadlocks in multi-agent systems. 8. **communication** - Communication Breakdown: Identifies inter-agent communication failures including message format mismatches, missing acknowledgments, and semantic misunderstandings. 9. **context** - Context Neglect: Detects when agents ignore or fail to use provided context, instructions, or relevant information in their responses. 10. **derailment** - Task Derailment: Identifies when agents go off-topic, lose focus on the assigned task, or pursue tangential goals. 11. **specification** - Specification Mismatch: Detects when agent output does not match the specified requirements, format constraints, or expected behavior defined in the task specification. 12. **decomposition** - Task Decomposition: Identifies failures in task breakdown including incorrect subtask ordering, missing dependencies, incomplete decomposition, and granularity issues. 13. **workflow** - Workflow Analysis: Detects structural issues in workflow execution including missing steps, incorrect ordering, parallel execution failures, and dependency violations. 14. **withholding** - Information Withholding: Detects when agents omit critical information from their responses that is available in their internal state or context. 15. **completion** - Completion Misjudgment: Identifies premature task completion (declaring done when incomplete) or delayed completion (continuing when task is finished). 16. **cost** - Cost Tracking: Monitors token usage and cost budgets. Detects budget overruns, cost spikes, and inefficient token usage patterns. 17. **convergence** - Convergence Detection: Detects metric plateau, regression, thrashing, and divergence in iterative agent processes. ### Enterprise Detectors (Feature Flag Required) 18. **grounding** - Grounding Detection: Verifies that agent claims are supported by source documents. Uses word overlap and citation checking. 19. **retrieval_quality** - Retrieval Quality: Evaluates the quality and relevance of retrieved documents in RAG pipelines. ### n8n-Specific Detectors 20. **n8n_schema** - N8N Schema Mismatch: Detects data schema mismatches between connected n8n nodes (type conflicts, missing fields, format errors). 21. **n8n_cycle** - N8N Graph Cycle: Identifies cycles in n8n workflow graphs that could cause infinite execution. 22. **n8n_complexity** - N8N Complexity: Flags overly complex n8n workflows (high node count, deep nesting, excessive branching). 23. **n8n_error** - N8N Error Handling: Detects missing or inadequate error handling in n8n workflows. 24. **n8n_resource** - N8N Resource Limits: Monitors resource consumption and detects workflows approaching memory, CPU, or execution limits. 25. **n8n_timeout** - N8N Timeout Protection: Detects workflows at risk of timeout due to long-running operations or external API dependencies. ### Dify-Specific Detectors 26. **dify_rag_poisoning** - Dify RAG Poisoning: Detects adversarial or corrupted documents injected into Dify knowledge bases. 27. **dify_iteration_escape** - Dify Iteration Escape: Identifies iteration nodes that fail to terminate or exceed configured limits. 28. **dify_model_fallback** - Dify Model Fallback: Detects silent model fallback events where the primary model fails and a weaker model is substituted. 29. **dify_variable_leak** - Dify Variable Leak: Identifies variable leakage between Dify workflow branches or conversation contexts. 30. **dify_classifier_drift** - Dify Classifier Drift: Detects intent classifier degradation over time in Dify chatbot applications. 31. **dify_tool_schema_mismatch** - Dify Tool Schema Mismatch: Identifies mismatches between tool definitions and actual tool call parameters. ### OpenClaw-Specific Detectors 32. **openclaw_session_loop** - OpenClaw Session Loop: Detects session-level loops in OpenClaw conversations across messaging channels. 33. **openclaw_tool_abuse** - OpenClaw Tool Abuse: Identifies excessive or inappropriate tool usage patterns by OpenClaw agents. 34. **openclaw_elevated_risk** - OpenClaw Elevated Risk: Flags sessions running in elevated mode without proper safeguards. 35. **openclaw_spawn_chain** - OpenClaw Spawn Chain: Detects unbounded agent spawning chains in multi-agent OpenClaw configurations. 36. **openclaw_channel_mismatch** - OpenClaw Channel Mismatch: Identifies responses formatted incorrectly for the target messaging channel (WhatsApp, Telegram, Slack, Discord). 37. **openclaw_sandbox_escape** - OpenClaw Sandbox Escape: Detects attempts to escape sandbox restrictions in OpenClaw agent execution. ### LangGraph-Specific Detectors 38. **langgraph_recursion** - LangGraph Recursion: Detects recursive graph execution that exceeds configured depth limits. 39. **langgraph_state_corruption** - LangGraph State Corruption: Identifies state corruption in LangGraph's state management, including reducer conflicts and partial updates. 40. **langgraph_edge_misroute** - LangGraph Edge Misroute: Detects conditional edge routing errors where execution follows unexpected paths. 41. **langgraph_tool_failure** - LangGraph Tool Failure: Identifies tool node failures including timeout, schema validation errors, and retry exhaustion. 42. **langgraph_parallel_sync** - LangGraph Parallel Sync: Detects synchronization issues in parallel graph execution branches. --- ## MAST Failure Taxonomy (Benchmark Results) ### Tier 1: High Detection (>95%) | Code | Name | Detection Rate | Description | |------|------|---------------|-------------| | F1 | Specification Mismatch | 98.0% | Output doesn't match what was requested | | F2 | Poor Task Decomposition | 100.0% | Tasks broken down incorrectly | | F5 | Flawed Workflow Design | 100.0% | Workflow has structural issues | | F6 | Task Derailment | 100.0% | Agent goes off-topic | | F7 | Context Neglect | 100.0% | Agent ignores provided context | | F8 | Information Withholding | 100.0% | Agent omits critical info | | F11 | Coordination Failure | 100.0% | Agents fail to coordinate | | F13 | Quality Gate Bypass | 96.0% | Skips quality checks | ### Tier 2: Good Detection (60-95%) | Code | Name | Detection Rate | Description | |------|------|---------------|-------------| | F14 | Completion Misjudgment | 84.0% | Declares done when incomplete | | F3 | Resource Misallocation | 66.7% | Compute/time allocated poorly | | F4 | Inadequate Tool Provision | 66.7% | Wrong tools used for task | | F9 | Role Usurpation | 66.7% | Agent exceeds its role boundaries | | F12 | Output Validation Failure | 66.7% | Output not validated properly | | F10 | Communication Breakdown | 64.0% | Inter-agent comms fail | ### Tier 3: RAG/Grounding | Code | Name | Description | |------|------|-------------| | F15 | Grounding Failure | Claims not supported by sources | | F16 | Retrieval Quality Failure | Retrieves wrong/irrelevant docs | ### Methodology - Dataset: 207MB, 20,575 traces - Sources: HuggingFace, GitHub, Anthropic, Research Papers - Frameworks tested: LangChain, LangGraph, n8n, Dify, OpenClaw, Claude Managed Agents, OpenAI, Anthropic - Overall detection rate: 82.4% (13.7% improvement from baseline) --- ## SDK Usage ### pisama-core (Python) The core detection, scoring, and healing engine. ```python pip install pisama-core ``` ```python from pisama_core import DetectionOrchestrator, ScoringEngine, Trace # Initialize orchestrator = DetectionOrchestrator() scoring = ScoringEngine() # Analyze a trace result = await orchestrator.analyze(trace) severity = scoring.calculate_severity([result]) ``` #### Key Classes - `Trace`, `Span`, `Event`, `TraceMetadata` - Trace data models - `DetectionOrchestrator` - Runs all registered detectors against a trace - `DetectorRegistry` - Registry of available detectors - `BaseDetector` - Base class for implementing custom detectors - `DetectionResult`, `Evidence`, `FixRecommendation` - Detection output models - `ScoringEngine` - Calculates severity scores from detection results - `Thresholds`, `SeverityLevel` - Configurable detection thresholds - `HealingEngine` - Orchestrates fix application - `HealingPlan`, `FixContext`, `FixResult` - Healing data models - `BaseFix` - Base class for implementing custom fixes - `FixInjectionProtocol` - Protocol for injecting fixes into agent execution - `EnforcementEngine`, `EnforcementLevel` - Fix enforcement configuration - `AuditLogger`, `AuditEvent` - Audit trail for all detection and healing actions - `PisamaConfig`, `DetectionConfig`, `HealingConfig` - Configuration models - `PlatformAdapter` - Base adapter for framework integrations - `PIIDetector`, `Tokenizer`, `TokenVault` - PII detection and tokenization ### pisama-agent-sdk (Python) Hooks for Claude Agent SDK with real-time failure prevention. ```python pip install pisama-agent-sdk ``` ```python from pisama_agent_sdk import pre_tool_use_hook, post_tool_use_hook from pisama_agent_sdk import configure_bridge # Optional: customize configuration configure_bridge( warning_threshold=40, block_threshold=60, timeout_ms=80, ) # Register hooks with Agent SDK agent.hooks.pre_tool_use = pre_tool_use_hook agent.hooks.post_tool_use = post_tool_use_hook ``` #### Advanced Usage ```python from pisama_agent_sdk import DetectionBridge, BridgeConfig from pisama_agent_sdk.hooks import PreToolUseHook, PostToolUseHook # Custom configuration config = BridgeConfig( warning_threshold=30, block_threshold=50, detection_timeout_ms=60, ) bridge = DetectionBridge(config=config) # Custom hooks with matchers pre_hook = PreToolUseHook(bridge=bridge) post_hook = PostToolUseHook(bridge=bridge) agent.hooks.pre_tool_use = pre_hook agent.hooks.post_tool_use = post_hook ``` #### Tool Matchers ```python from pisama_agent_sdk import ( ALL_TOOLS, # Match all tool calls FILE_TOOLS, # Match file read/write/edit tools SHELL_TOOLS, # Match bash/shell tools DANGEROUS_COMMANDS,# Match rm, git reset, etc. AGENT_TOOLS, # Match agent/subagent tools create_matcher, # Create custom matchers ) ``` ### pisama-claude-code (Python) Claude Code integration with trace capture and guardian hooks. ```python pip install pisama-claude-code ``` ```python from pisama_claude_code import install # Install hooks into Claude Code install() ``` --- ## Integration Examples ### n8n Integration Add a webhook node at the end of your n8n workflow: ``` POST https://api.pisama.ai/api/v1/n8n/webhook Headers: X-Pisama-API-Key: your-api-key Content-Type: application/json Body: { "executionId": "{{ $execution.id }}", "workflowId": "{{ $workflow.id }}", "workflowName": "{{ $workflow.name }}", "mode": "{{ $execution.mode }}", "startedAt": "{{ $execution.startedAt }}", "status": "success", "data": {{ JSON.stringify($input.all()) }} } ``` ### LangGraph Integration ```python import httpx from langgraph.graph import StateGraph PISAMA_API_KEY = "your-api-key" PISAMA_URL = "https://api.pisama.ai/api/v1/langgraph/webhook" async def send_to_pisama(run_data: dict): async with httpx.AsyncClient() as client: await client.post( PISAMA_URL, json=run_data, headers={"X-Pisama-API-Key": PISAMA_API_KEY}, ) # After graph execution await send_to_pisama({ "run_id": run_id, "assistant_id": assistant_id, "thread_id": thread_id, "graph_id": graph_id, "started_at": started_at, "finished_at": finished_at, "status": "completed", "total_tokens": total_tokens, "total_steps": len(steps), "steps": steps, }) ``` ### OpenTelemetry Integration (Any Framework) ```python from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter # Configure OTEL to send to Pisama exporter = OTLPSpanExporter( endpoint="https://api.pisama.ai/api/v1/traces/ingest", headers={"X-Pisama-API-Key": "your-api-key"}, ) provider = TracerProvider() provider.add_span_processor(BatchSpanProcessor(exporter)) trace.set_tracer_provider(provider) tracer = trace.get_tracer("my-agent") with tracer.start_as_current_span("agent-step") as span: span.set_attribute("gen_ai.agent_id", "planner") span.set_attribute("gen_ai.token_count", 1500) # ... Agent logic ``` ### Dify Integration Configure a webhook in your Dify app's post-execution hook: ``` POST https://api.pisama.ai/api/v1/dify/webhook Headers: X-Pisama-API-Key: your-api-key Body: { "workflow_run_id": "run-id", "app_id": "app-id", "app_name": "My Dify App", "app_type": "workflow", "started_at": "2025-01-01T00:00:00Z", "status": "succeeded", "total_tokens": 2500, "nodes": [...] } ``` ### OpenClaw Integration ``` POST https://api.pisama.ai/api/v1/openclaw/webhook Headers: X-Pisama-API-Key: your-api-key Body: { "session_id": "session-id", "instance_id": "instance-id", "agent_name": "support-bot", "channel": "whatsapp", "started_at": "2025-01-01T00:00:00Z", "status": "completed", "message_count": 15, "events": [...] } ``` --- ## Detection Configuration ### Confidence Tiers - **HIGH** (>=80%): Strong signal, likely a real failure - **LIKELY** (60-79%): Probable failure, review recommended - **POSSIBLE** (40-59%): Potential issue, may be expected behavior - **LOW** (<40%): Weak signal, informational only ### Threshold Customization ```python import httpx # Update detection thresholds httpx.put( "https://api.pisama.ai/api/v1/settings/thresholds", json={ "global_thresholds": { "structural_threshold": 0.85, "semantic_threshold": 0.80, "loop_detection_window": 8, "min_matches_for_loop": 3, "confidence_scaling": 1.0, }, "framework_thresholds": { "langgraph": { "loop_detection_window": 10, "semantic_threshold": 0.75, }, }, }, headers={"Authorization": "Bearer "}, ) ``` ### Readiness Tiers - **Production**: F1 >= 0.80, Precision >= 0.70, 30+ samples - **Beta**: F1 >= 0.65, 15+ samples - **Experimental**: F1 >= 0.40, 8+ samples --- ## Healing Workflow 1. Detection triggers fix suggestion generation 2. Fix generators produce framework-specific remediation code 3. Approval policy determines if auto-apply or human review 4. Status transitions: pending -> in_progress -> applied/staged/failed 5. Applied fixes can be rolled back at any time 6. Verification orchestrator confirms fix effectiveness ### Available Fix Generators - Loop fixes (break loop, add exit conditions) - Corruption fixes (state rollback, validation) - Persona fixes (re-anchor persona, add constraints) - Deadlock fixes (timeout, priority adjustment) - Hallucination fixes (add grounding, source checking) - Injection fixes (input sanitization, guardrails) - Overflow fixes (context pruning, summarization) - Derailment fixes (task re-focusing, guardrails) - Context neglect fixes (context injection, attention) - Communication fixes (message format, protocol) - Specification fixes (output validation, constraints) - Decomposition fixes (re-planning, dependency ordering) - Workflow fixes (step insertion, reordering) - Withholding fixes (completeness checking) - Completion fixes (progress tracking, criteria) - Cost fixes (budget enforcement, optimization) --- ## Links - Website: https://pisama.ai - Documentation: https://docs.pisama.ai - API Docs: https://api.pisama.ai/docs - GitHub: https://github.com/Pisama-AI/pisama ## License MIT