EMA-based learning that auto-tunes model selection and concurrency from execution history.

AdaptiveScheduler

The AdaptiveScheduler learns from execution history using exponential moving averages (EMA) to recommend the optimal model and concurrency level for each task type. It tracks duration, success rate, token usage, and cost per model/task-type pair, and automatically adjusts concurrency based on failure rates.

import { AdaptiveScheduler } from '@codmir/cortex';

Constructor

new AdaptiveScheduler(config?: Partial<AdaptiveSchedulerConfig>)

AdaptiveSchedulerConfig

interface AdaptiveSchedulerConfig {
  learningRate: number;           // default: 0.15
  minSamples: number;            // default: 5
  decayFactor: number;           // default: 0.95
  maxConcurrency: number;        // default: 32
  minConcurrency: number;        // default: 1
  adjustmentIntervalMs: number;  // default: 30000
}

Property	Default	Description
`learningRate`	`0.15`	EMA alpha for smoothing metrics
`minSamples`	`5`	Minimum samples before a model is recommended
`decayFactor`	`0.95`	Decay multiplier for p95 duration estimates
`maxConcurrency`	`32`	Upper bound for concurrent tasks
`minConcurrency`	`1`	Lower bound for concurrent tasks
`adjustmentIntervalMs`	`30000`	Minimum interval between concurrency adjustments

Initial concurrency is set to ceil(maxConcurrency / 2).

Methods

recordExecution()

Feeds execution data into the scheduler. Updates EMA statistics for the model/task-type pair and adjusts concurrency if enough time has passed since the last adjustment.

recordExecution(
  modelId: string,
  taskType: string,
  durationMs: number,
  tokensUsed: number,
  costUsd: number,
  success: boolean
): void

Parameter	Type	Description
`modelId`	`string`	The model that executed the task
`taskType`	`string`	Task type identifier (e.g., `'code-review'`, `'chat'`)
`durationMs`	`number`	Execution duration in milliseconds
`tokensUsed`	`number`	Total tokens consumed
`costUsd`	`number`	Cost in USD
`success`	`boolean`	Whether the execution succeeded

scheduler.recordExecution('claude-sonnet', 'code-review', 2340, 1500, 0.0042, true);
scheduler.recordExecution('gpt-4o-mini', 'chat', 450, 800, 0.0001, true);
scheduler.recordExecution('claude-sonnet', 'code-review', 18000, 0, 0, false);

Returns a scheduling recommendation for a task type based on accumulated statistics. Models with fewer than minSamples are excluded.

recommend(
  taskType: string,
  availableModels: string[]
): SchedulingRecommendation

Parameter	Type	Description
`taskType`	`string`	The task type to get a recommendation for
`availableModels`	`string[]`	Model IDs to consider

const rec = scheduler.recommend('code-review', ['claude-sonnet', 'gpt-4o-mini']);

console.log(rec.optimalModel);        // 'claude-sonnet'
console.log(rec.optimalConcurrency);  // 16
console.log(rec.estimatedDurationMs); // 2340
console.log(rec.confidence);          // 0.6 (30 samples / 50)
console.log(rec.reasoning);           // 'claude-sonnet scores 82.3 (95% success, 2340ms avg, $0.0042/req) over 30 samples'

getConcurrency()

Returns the current concurrency level.

getConcurrency(): number

getFailureRate()

Returns the number of failures in the given time window.

getFailureRate(windowMs?: number): number

Parameter	Type	Description
`windowMs`	`number` (optional)	Time window in ms (default: 60000)

getStats()

Returns execution statistics, optionally filtered by model and/or task type.

getStats(modelId?: string, taskType?: string): ExecutionStats[]

Parameter	Type	Description
`modelId`	`string` (optional)	Filter by model ID
`taskType`	`string` (optional)	Filter by task type

flush()

Clears all statistics, resets the failure window, and resets concurrency to ceil(maxConcurrency / 2).

flush(): void

Types

ExecutionStats

interface ExecutionStats {
  modelId: string;
  taskType: string;
  avgDurationMs: number;
  p95DurationMs: number;
  successRate: number;
  avgTokensUsed: number;
  avgCostUsd: number;
  sampleCount: number;
  lastUpdated: number;
}

SchedulingRecommendation

interface SchedulingRecommendation {
  optimalModel: string;
  optimalConcurrency: number;
  estimatedDurationMs: number;
  confidence: number;
  reasoning: string;
}

The confidence value ranges from 0 to 1, calculated as min(sampleCount / 50, 1).

Concurrency Auto-Tuning

The scheduler automatically adjusts concurrency at most once per adjustmentIntervalMs:

If more than 5 failures in the last 60 seconds and concurrency is above minConcurrency: scale down to 75% of current.
If zero failures and concurrency is below maxConcurrency: increment by 1.

Example

import { AdaptiveScheduler } from '@codmir/cortex';

const scheduler = new AdaptiveScheduler({
  learningRate: 0.2,
  maxConcurrency: 16,
  minSamples: 3,
});

// Feed execution data
for (let i = 0; i < 10; i++) {
  scheduler.recordExecution(
    'claude-sonnet',
    'code-review',
    2000 + Math.random() * 1000,
    1200 + Math.floor(Math.random() * 500),
    0.004,
    Math.random() > 0.1, // 90% success rate
  );
}

for (let i = 0; i < 10; i++) {
  scheduler.recordExecution(
    'gpt-4o-mini',
    'code-review',
    500 + Math.random() * 200,
    800,
    0.0001,
    Math.random() > 0.3, // 70% success rate
  );
}

// Get recommendation
const rec = scheduler.recommend('code-review', ['claude-sonnet', 'gpt-4o-mini']);
console.log(`Best model: ${rec.optimalModel}`);
console.log(`Concurrency: ${rec.optimalConcurrency}`);
console.log(`Confidence: ${(rec.confidence * 100).toFixed(0)}%`);

// Check stats
const stats = scheduler.getStats('claude-sonnet', 'code-review');
console.log(`Success rate: ${(stats[0].successRate * 100).toFixed(1)}%`);
console.log(`Avg duration: ${stats[0].avgDurationMs.toFixed(0)}ms`);