Track every LLM call — tokens, cost, latency, and errors — with one import.
AI Monitoring
This guide shows you how to track every AI/LLM call in your application -- tokens consumed, cost per request, latency, and errors -- using the Codmir SDK.
Install
npm install @codmir/sdkInitialize
Add the SDK to your application entry point. The traceAI flag enables automatic AI call instrumentation:
import { CodmirClient } from '@codmir/sdk';
const codmir = new CodmirClient({
apiKey: process.env.CODMIR_API_KEY,
baseUrl: 'https://codmir.com/api',
});For framework-specific initialization:
import * as Codmir from '@codmir/sdk/nextjs';
Codmir.init({
dsn: process.env.NEXT_PUBLIC_CODMIR_DSN,
traceAI: true,
});Using CodemirAI for Traced Calls
The @codmir/ai package provides a unified client that automatically emits events for monitoring:
import { createCodemirAI } from '@codmir/ai';
const ai = createCodemirAI({
defaultProvider: 'anthropic',
context: {
projectId: 'my-project',
name: 'My App',
techStack: ['TypeScript', 'Next.js', 'Prisma'],
},
governance: {
enabled: true,
rateLimitPerMinute: 60,
},
});Every call through CodemirAI emits structured events with token counts, cost, and timing.
Simple Chat
const response = await ai.chat('Explain the auth flow in this codebase');
console.log(response);Completion with Full Control
const response = await ai.complete({
messages: [
{ role: 'system', content: 'You are a code reviewer.' },
{ role: 'user', content: 'Review this function for security issues.' },
],
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
maxTokens: 2048,
});
console.log(response.content);
console.log(response.usage); // { inputTokens, outputTokens, totalTokens }Streaming
for await (const chunk of ai.stream({
messages: [{ role: 'user', content: 'Write a migration script' }],
})) {
process.stdout.write(chunk.content);
}Listening to AI Events
Subscribe to events for custom monitoring, logging, or alerting:
import { createCodemirAI } from '@codmir/ai';
const ai = createCodemirAI({ defaultProvider: 'anthropic' });
// Track all events
ai.on('*', (event) => {
console.log(`[${event.type}]`, event.data);
});
// Track completions specifically
ai.on('request_completed', (event) => {
const { provider, usage, requestId } = event.data;
console.log(`Provider: ${provider}`);
console.log(`Tokens: ${usage.totalTokens}`);
});
// Track failures
ai.on('request_failed', (event) => {
console.error(`AI request ${event.data.requestId} failed:`, event.data.error);
});
// Track rate limits
ai.on('rate_limit_reached', (event) => {
console.warn(`Rate limit hit: ${event.data.limit}/min`);
});
// Track streaming
ai.on('stream_started', (event) => {
console.log(`Stream started for ${event.data.provider}`);
});Available event types: request_started, request_completed, request_failed, stream_started, stream_chunk, stream_ended, rate_limit_reached, governance_decision, context_updated, approval_required.
Cost Tracking per Model
Check which providers are available and their status:
const statuses = ai.getProviderStatus();
for (const status of statuses) {
console.log(`${status.name}: ${status.configured ? 'ready' : 'not configured'}`);
}For per-model cost tracking, use the Cortex SmartRouter which scores models on cost, latency, and quality:
import { createCortex } from '@codmir/cortex';
const cortex = createCortex({
// ... config
models: [
{
id: 'claude-sonnet-4-20250514',
provider: 'anthropic',
tier: 'standard',
costPer1kInput: 0.003,
costPer1kOutput: 0.015,
avgLatencyMs: 800,
qualityScore: 0.95,
capabilities: ['code', 'reasoning', 'analysis'],
maxTokens: 200000,
healthy: true,
circuitOpen: false,
},
],
});
const report = cortex.getIntelligenceReport();
console.log(report.router); // health per model
console.log(report.scheduler); // concurrency, failure rate, cost statsDashboard
In the Codmir dashboard, the AI Monitoring view shows:
- Waterfall view -- every AI call in sequence with timing bars
- Token usage -- input and output tokens per request
- Cost breakdown -- cost per model, per provider, per day
- Error rate -- failed AI calls with error details
- Latency percentiles -- p50, p95, p99 response times
- Session replay -- for agent workflows, replay the step-by-step reasoning
Next Steps
- Multi-Provider AI -- route between providers through one API
- Smart Model Routing -- automatically pick the best model per task
- Error Tracking -- capture application errors alongside AI traces