Track every LLM call — tokens, cost, latency, and errors — with one import.

AI Monitoring

This guide shows you how to track every AI/LLM call in your application -- tokens consumed, cost per request, latency, and errors -- using the Codmir SDK.

Install

npm install @codmir/sdk

Initialize

Add the SDK to your application entry point. The traceAI flag enables automatic AI call instrumentation:

import { CodmirClient } from '@codmir/sdk';

const codmir = new CodmirClient({
  apiKey: process.env.CODMIR_API_KEY,
  baseUrl: 'https://codmir.com/api',
});

For framework-specific initialization:

import * as Codmir from '@codmir/sdk/nextjs';

Codmir.init({
  dsn: process.env.NEXT_PUBLIC_CODMIR_DSN,
  traceAI: true,
});

Using CodemirAI for Traced Calls

The @codmir/ai package provides a unified client that automatically emits events for monitoring:

import { createCodemirAI } from '@codmir/ai';

const ai = createCodemirAI({
  defaultProvider: 'anthropic',
  context: {
    projectId: 'my-project',
    name: 'My App',
    techStack: ['TypeScript', 'Next.js', 'Prisma'],
  },
  governance: {
    enabled: true,
    rateLimitPerMinute: 60,
  },
});

Every call through CodemirAI emits structured events with token counts, cost, and timing.

Simple Chat

const response = await ai.chat('Explain the auth flow in this codebase');
console.log(response);

Completion with Full Control

const response = await ai.complete({
  messages: [
    { role: 'system', content: 'You are a code reviewer.' },
    { role: 'user', content: 'Review this function for security issues.' },
  ],
  provider: 'anthropic',
  model: 'claude-sonnet-4-20250514',
  maxTokens: 2048,
});

console.log(response.content);
console.log(response.usage); // { inputTokens, outputTokens, totalTokens }

Streaming

for await (const chunk of ai.stream({
  messages: [{ role: 'user', content: 'Write a migration script' }],
})) {
  process.stdout.write(chunk.content);
}

Listening to AI Events

Subscribe to events for custom monitoring, logging, or alerting:

import { createCodemirAI } from '@codmir/ai';

const ai = createCodemirAI({ defaultProvider: 'anthropic' });

// Track all events
ai.on('*', (event) => {
  console.log(`[${event.type}]`, event.data);
});

// Track completions specifically
ai.on('request_completed', (event) => {
  const { provider, usage, requestId } = event.data;
  console.log(`Provider: ${provider}`);
  console.log(`Tokens: ${usage.totalTokens}`);
});

// Track failures
ai.on('request_failed', (event) => {
  console.error(`AI request ${event.data.requestId} failed:`, event.data.error);
});

// Track rate limits
ai.on('rate_limit_reached', (event) => {
  console.warn(`Rate limit hit: ${event.data.limit}/min`);
});

// Track streaming
ai.on('stream_started', (event) => {
  console.log(`Stream started for ${event.data.provider}`);
});

Available event types: request_started, request_completed, request_failed, stream_started, stream_chunk, stream_ended, rate_limit_reached, governance_decision, context_updated, approval_required.

Cost Tracking per Model

Check which providers are available and their status:

const statuses = ai.getProviderStatus();

for (const status of statuses) {
  console.log(`${status.name}: ${status.configured ? 'ready' : 'not configured'}`);
}

For per-model cost tracking, use the Cortex SmartRouter which scores models on cost, latency, and quality:

import { createCortex } from '@codmir/cortex';

const cortex = createCortex({
  // ... config
  models: [
    {
      id: 'claude-sonnet-4-20250514',
      provider: 'anthropic',
      tier: 'standard',
      costPer1kInput: 0.003,
      costPer1kOutput: 0.015,
      avgLatencyMs: 800,
      qualityScore: 0.95,
      capabilities: ['code', 'reasoning', 'analysis'],
      maxTokens: 200000,
      healthy: true,
      circuitOpen: false,
    },
  ],
});

const report = cortex.getIntelligenceReport();
console.log(report.router);    // health per model
console.log(report.scheduler); // concurrency, failure rate, cost stats

Dashboard

In the Codmir dashboard, the AI Monitoring view shows:

Waterfall view -- every AI call in sequence with timing bars
Token usage -- input and output tokens per request
Cost breakdown -- cost per model, per provider, per day
Error rate -- failed AI calls with error details
Latency percentiles -- p50, p95, p99 response times
Session replay -- for agent workflows, replay the step-by-step reasoning

Next Steps

Multi-Provider AI -- route between providers through one API
Smart Model Routing -- automatically pick the best model per task
Error Tracking -- capture application errors alongside AI traces