Back to projects
Feb 09, 2026
5 min read

AgentWatch: Production-Grade Observability for AI Agents

A lightweight, self-hosted monitoring platform for AI agents that tracks token usage, latency, error rates, and cost per task in real-time — with zero-code integration and no vendor lock-in.

As AI agents move into production, knowing what they cost, how often they fail, and how fast they respond becomes critical. AgentWatch is a self-hosted observability platform that gives you complete visibility into your AI agent workflows — token usage, latency (p50/p95), error rates, and cost per task — all in real-time, with zero vendor lock-in. The entire system is a single FastAPI server (~364 lines), a SQLite database, and a vanilla JS dashboard with no build step required.

Key Features

FeatureDetails
Zero-Code IntegrationAuto-patches OpenAI and Anthropic clients — no code changes needed
Real-Time StreamingLive event feed via Server-Sent Events (SSE)
Cost TrackingBuilt-in per-model pricing for GPT-4o, Claude, and more
Latency Insightsp50, p95, and average latency per agent/model/task
Error MonitoringError rates broken down by agent, model, and task
Self-HostedSQLite + FastAPI — no Postgres, no external dependencies
Pip Installablepip install -e . and agentwatch serve to start

Architecture

┌──────────────────────────────────────────────────────────────┐
│                     Your AI Agent Code                       │
└──────────────────────────────────────────────────────────────┘

                   ┌──────────────────┐
                   │  AgentWatch SDK  │
                   │ (auto-patching,  │
                   │  decorator,      │
                   │  manual log)     │
                   └──────────────────┘

                   ┌──────────────────┐
                   │  Background      │
                   │  Event Sender    │
                   │  (batching,      │
                   │   threading)     │
                   └──────────────────┘

                ┌─────────────────────────┐
                │  FastAPI Server (8787)  │
                │  ├─ REST API endpoints  │
                │  ├─ SSE streaming       │
                │  └─ Static dashboard    │
                └─────────────────────────┘

                ┌─────────────────────────┐
                │   SQLite Database       │
                │  (events, metrics)      │
                └─────────────────────────┘

                ┌─────────────────────────┐
                │  Dashboard (Browser)    │
                │  ├─ Metric cards        │
                │  ├─ Charts (cost, lat)  │
                │  ├─ Live event feed     │
                │  └─ Breakdown tables    │
                └─────────────────────────┘

Components

ComponentPurposeTech
SDKClient instrumentation libraryPython threading, httpx
ServerREST API + SSE, async event handlingFastAPI, aiosqlite
DatabaseEvent storage & metrics aggregationSQLite, async queries
DashboardReal-time monitoring UIVanilla JS, Chart.js, SSE

SDK Integration Modes

AgentWatch provides four ways to instrument your agents:

1. Auto-Patching (Zero Code Changes)

import agentwatch

agentwatch.init(
    server_url="http://localhost:8787",
    agent="research-bot",
    auto_patch=True
)

# All OpenAI/Anthropic calls are now tracked automatically
client = openai.OpenAI()
response = client.chat.completions.create(model="gpt-4o", ...)
# ✅ Tokens, latency, cost recorded automatically

2. Decorator (Selective Tracking)

@track(task="summarize-documents", agent="research-bot")
def summarize(text):
    response = client.chat.completions.create(...)
    return response.choices[0].message.content

3. Manual Logging (Full Control)

agentwatch.log_event(
    task_name="custom-analysis",
    agent_name="analytics-bot",
    model="gpt-4o-mini",
    input_tokens=500,
    output_tokens=100,
    latency_ms=1200.0,
    status="success"
)

4. Class-Based API

watch = AgentWatch(server_url="http://localhost:8787", agent="my-agent")
watch.log_event(task_name="query", model="gpt-4o", ...)

Dashboard

The dashboard is a single HTML file (~48 KB) with no build step, featuring a dark theme designed for long monitoring sessions:

  • Metric Cards — Total cost, request count with success rate badge, p50/p95 latency, active agent count
  • Cost & Tokens Chart — Dual Y-axis with cost as filled gradient line and tokens as dashed purple line
  • Latency Chart — p50 and p95 with shaded band between them
  • Live Event Feed — Real-time SSE stream with color-coded agent badges, task labels, token counts, and status dots
  • Breakdown Tables — Sortable by agent, model, or task with color-coded error rate bars

Built-In Cost Tracking

Pricing for popular models is included out of the box:

ModelInputOutput
gpt-4o$2.50/M$10.00/M
gpt-4o-mini$0.15/M$0.60/M
claude-opus-4$15.00/M$75.00/M
claude-sonnet-4$3.00/M$15.00/M
claude-haiku-3.5$0.80/M$4.00/M

Unsupported models fall back to default pricing.

Performance

MetricValue
Event ingestion~5,000 events/sec (batch mode)
Query latency<50ms for typical queries
Dashboard refresh~500ms (parallel fetches)
Storage~50 KB per 1,000 events
Memory~50 MB baseline

Tech Stack

  • Server: FastAPI with async/await throughout (~364 lines)
  • Database: SQLite via aiosqlite with indexed queries and parameterized SQL (no injection vulnerabilities)
  • SDK: Python with httpx, background threading for non-blocking event batching
  • Dashboard: Vanilla HTML/CSS/JS with Chart.js — no Node.js, no build step
  • CLI: agentwatch serve entry point with configurable host, port, and DB path
  • Demo: Backfill mode generates 7 days of realistic simulated data (~3,500 events) with reproducible seeding

Why AgentWatch

  • Cost Control — Know exactly what each agent, model, and task costs before the bill arrives
  • Reliability — Catch error rate spikes and latency regressions in real-time
  • Visibility — Full audit trail of every LLM call without external dependencies
  • Non-Intrusive — Tracking failures never break your application; events are silently dropped with a warning
  • No Vendor Lock-In — Self-hosted, open source, MIT licensed, zero external services required