Automatically route AI tasks to the optimal model based on cost, latency, and quality.
Smart Model Routing
The SmartRouter in @codmir/cortex automatically selects the best AI model for each task. It scores models on cost, latency, and quality, uses circuit breakers to avoid unhealthy providers, and adapts over time based on success/failure feedback.
How SmartRouter Works
For each incoming task, the router:
- Filters out models with open circuit breakers (too many recent failures)
- Scores eligible models based on the selected strategy
- Returns the best model with a list of fallbacks
- Tracks success/failure to improve future decisions
Define Model Profiles
Each model needs a profile that describes its cost, speed, and quality:
import { createCortex } from '@codmir/cortex';
import type { ModelProfile } from '@codmir/cortex';
const models: ModelProfile[] = [
{
id: 'claude-sonnet-4-20250514',
provider: 'anthropic',
tier: 'standard',
costPer1kInput: 0.003,
costPer1kOutput: 0.015,
avgLatencyMs: 800,
qualityScore: 0.95,
capabilities: ['code', 'reasoning', 'analysis'],
maxTokens: 200000,
healthy: true,
circuitOpen: false,
},
{
id: 'gpt-4o',
provider: 'openai',
tier: 'standard',
costPer1kInput: 0.005,
costPer1kOutput: 0.015,
avgLatencyMs: 600,
qualityScore: 0.90,
capabilities: ['code', 'reasoning', 'vision'],
maxTokens: 128000,
healthy: true,
circuitOpen: false,
},
{
id: 'gemini-2.5-flash',
provider: 'gemini',
tier: 'fast',
costPer1kInput: 0.0001,
costPer1kOutput: 0.0004,
avgLatencyMs: 200,
qualityScore: 0.75,
capabilities: ['code', 'reasoning'],
maxTokens: 1000000,
healthy: true,
circuitOpen: false,
},
];
const cortex = createCortex({
auth: { jwtSecret: process.env.JWT_SECRET! },
models,
router: {
defaultStrategy: 'balanced',
costWeight: 0.3,
latencyWeight: 0.3,
qualityWeight: 0.4,
},
threat: {},
scheduler: {},
healer: {},
mesh: {},
});Routing Strategies
The strategy field controls how the router scores models:
| Strategy | Behavior |
|---|---|
cost | Cheapest model wins. Good for bulk operations. |
latency | Fastest model wins. Good for real-time UX. |
quality | Highest quality score wins. Good for critical decisions. |
balanced | Weighted combination of cost, latency, and quality. Default. |
Route a Task
const decision = cortex.routeTask(
{
id: 'task-1',
type: 'code_review',
input: 'Review this PR for security issues',
maxTokens: 4096,
},
'quality', // Override strategy for this task
);
console.log(decision.model.id); // 'claude-sonnet-4-20250514'
console.log(decision.reason); // Why this model was chosen
console.log(decision.estimatedCostUsd); // Estimated cost
console.log(decision.estimatedLatencyMs);// Expected response time
console.log(decision.fallbacks); // Backup models if primary failsExecute with Automatic Routing
const result = await cortex.executeTask({
id: 'task-2',
type: 'code_generation',
input: 'Write a REST endpoint for user registration',
maxTokens: 8192,
});
console.log(result.model); // Which model was used
console.log(result.durationMs); // Actual execution time
console.log(result.tokensUsed); // Actual tokens consumed
console.log(result.costUsd); // Actual costRecording Success and Failure
The router uses circuit breakers to avoid models that are failing. Record outcomes to improve routing:
// After a successful call
cortex.router.recordSuccess('claude-sonnet-4-20250514');
// After a failed call
cortex.router.recordFailure('gpt-4o');When failures exceed the circuit breaker threshold (default: 5), the model is temporarily excluded from routing. The circuit resets after the configured timeout (default: 60 seconds).
Circuit Breaker Configuration
const cortex = createCortex({
// ...
router: {
defaultStrategy: 'balanced',
circuitBreakerThreshold: 5, // Open circuit after 5 failures
circuitBreakerResetMs: 60000, // Try again after 60 seconds
costWeight: 0.3,
latencyWeight: 0.3,
qualityWeight: 0.4,
},
});Health Monitoring
Get a full health report across all models:
const report = cortex.getIntelligenceReport();
console.log(report.router); // Per-model health, circuit states
console.log(report.scheduler); // Concurrency, failure rate, execution stats
console.log(report.healer); // Process health, recent healing events
console.log(report.sessions); // Active session countAdaptive Scheduling
The AdaptiveScheduler works alongside the router to optimize concurrency and retry strategies:
// Get scheduling recommendation for a task type
const recommendation = cortex.scheduler.recommend('code_review', [
'claude-sonnet-4-20250514',
'gpt-4o',
]);
console.log(recommendation.modelId); // Recommended model
console.log(recommendation.concurrency); // Suggested parallelism
console.log(recommendation.timeout); // Suggested timeoutNext Steps
- Multi-Provider AI -- use the unified AI client that sits on top of routing
- Swarm Execution -- route tasks across a swarm of agents
- Edge Authentication -- protect your AI endpoints with threat detection