EMA-based learning that auto-tunes model selection and concurrency from execution history.
AdaptiveScheduler
The AdaptiveScheduler learns from execution history using exponential moving averages (EMA) to recommend the optimal model and concurrency level for each task type. It tracks duration, success rate, token usage, and cost per model/task-type pair, and automatically adjusts concurrency based on failure rates.
import { AdaptiveScheduler } from '@codmir/cortex';Constructor
new AdaptiveScheduler(config?: Partial<AdaptiveSchedulerConfig>)AdaptiveSchedulerConfig
interface AdaptiveSchedulerConfig {
learningRate: number; // default: 0.15
minSamples: number; // default: 5
decayFactor: number; // default: 0.95
maxConcurrency: number; // default: 32
minConcurrency: number; // default: 1
adjustmentIntervalMs: number; // default: 30000
}| Property | Default | Description |
|---|---|---|
learningRate | 0.15 | EMA alpha for smoothing metrics |
minSamples | 5 | Minimum samples before a model is recommended |
decayFactor | 0.95 | Decay multiplier for p95 duration estimates |
maxConcurrency | 32 | Upper bound for concurrent tasks |
minConcurrency | 1 | Lower bound for concurrent tasks |
adjustmentIntervalMs | 30000 | Minimum interval between concurrency adjustments |
Initial concurrency is set to ceil(maxConcurrency / 2).
Methods
recordExecution()
Feeds execution data into the scheduler. Updates EMA statistics for the model/task-type pair and adjusts concurrency if enough time has passed since the last adjustment.
recordExecution(
modelId: string,
taskType: string,
durationMs: number,
tokensUsed: number,
costUsd: number,
success: boolean
): void| Parameter | Type | Description |
|---|---|---|
modelId | string | The model that executed the task |
taskType | string | Task type identifier (e.g., 'code-review', 'chat') |
durationMs | number | Execution duration in milliseconds |
tokensUsed | number | Total tokens consumed |
costUsd | number | Cost in USD |
success | boolean | Whether the execution succeeded |
scheduler.recordExecution('claude-sonnet', 'code-review', 2340, 1500, 0.0042, true);
scheduler.recordExecution('gpt-4o-mini', 'chat', 450, 800, 0.0001, true);
scheduler.recordExecution('claude-sonnet', 'code-review', 18000, 0, 0, false);recommend()
Returns a scheduling recommendation for a task type based on accumulated statistics. Models with fewer than minSamples are excluded.
recommend(
taskType: string,
availableModels: string[]
): SchedulingRecommendation| Parameter | Type | Description |
|---|---|---|
taskType | string | The task type to get a recommendation for |
availableModels | string[] | Model IDs to consider |
const rec = scheduler.recommend('code-review', ['claude-sonnet', 'gpt-4o-mini']);
console.log(rec.optimalModel); // 'claude-sonnet'
console.log(rec.optimalConcurrency); // 16
console.log(rec.estimatedDurationMs); // 2340
console.log(rec.confidence); // 0.6 (30 samples / 50)
console.log(rec.reasoning); // 'claude-sonnet scores 82.3 (95% success, 2340ms avg, $0.0042/req) over 30 samples'getConcurrency()
Returns the current concurrency level.
getConcurrency(): numbergetFailureRate()
Returns the number of failures in the given time window.
getFailureRate(windowMs?: number): number| Parameter | Type | Description |
|---|---|---|
windowMs | number (optional) | Time window in ms (default: 60000) |
getStats()
Returns execution statistics, optionally filtered by model and/or task type.
getStats(modelId?: string, taskType?: string): ExecutionStats[]| Parameter | Type | Description |
|---|---|---|
modelId | string (optional) | Filter by model ID |
taskType | string (optional) | Filter by task type |
flush()
Clears all statistics, resets the failure window, and resets concurrency to ceil(maxConcurrency / 2).
flush(): voidTypes
ExecutionStats
interface ExecutionStats {
modelId: string;
taskType: string;
avgDurationMs: number;
p95DurationMs: number;
successRate: number;
avgTokensUsed: number;
avgCostUsd: number;
sampleCount: number;
lastUpdated: number;
}SchedulingRecommendation
interface SchedulingRecommendation {
optimalModel: string;
optimalConcurrency: number;
estimatedDurationMs: number;
confidence: number;
reasoning: string;
}The confidence value ranges from 0 to 1, calculated as min(sampleCount / 50, 1).
Concurrency Auto-Tuning
The scheduler automatically adjusts concurrency at most once per adjustmentIntervalMs:
- If more than 5 failures in the last 60 seconds and concurrency is above
minConcurrency: scale down to 75% of current. - If zero failures and concurrency is below
maxConcurrency: increment by 1.
Example
import { AdaptiveScheduler } from '@codmir/cortex';
const scheduler = new AdaptiveScheduler({
learningRate: 0.2,
maxConcurrency: 16,
minSamples: 3,
});
// Feed execution data
for (let i = 0; i < 10; i++) {
scheduler.recordExecution(
'claude-sonnet',
'code-review',
2000 + Math.random() * 1000,
1200 + Math.floor(Math.random() * 500),
0.004,
Math.random() > 0.1, // 90% success rate
);
}
for (let i = 0; i < 10; i++) {
scheduler.recordExecution(
'gpt-4o-mini',
'code-review',
500 + Math.random() * 200,
800,
0.0001,
Math.random() > 0.3, // 70% success rate
);
}
// Get recommendation
const rec = scheduler.recommend('code-review', ['claude-sonnet', 'gpt-4o-mini']);
console.log(`Best model: ${rec.optimalModel}`);
console.log(`Concurrency: ${rec.optimalConcurrency}`);
console.log(`Confidence: ${(rec.confidence * 100).toFixed(0)}%`);
// Check stats
const stats = scheduler.getStats('claude-sonnet', 'code-review');
console.log(`Success rate: ${(stats[0].successRate * 100).toFixed(1)}%`);
console.log(`Avg duration: ${stats[0].avgDurationMs.toFixed(0)}ms`);