Let Overseer detect, fix, and deploy patches while you sleep.
Self-Healing with Overseer
Overseer can autonomously detect production errors, trace their root cause, generate a fix, run tests, and deploy a patch -- all without waking you up. This guide explains how the self-healing loop works and how to configure it.
How Self-Healing Works
The loop follows five stages:
- Detect -- Overseer captures an error spike, build failure, or latency anomaly via the EventWatcher
- Trace -- The DecisionEngine evaluates the event and decides whether to act silently, notify you, or barge in with voice
- Fix -- The TicketPipeline or IncidentPipeline spawns an agent task to investigate and produce a patch
- Test -- The agent runs the project's test suite against the patch
- Deploy -- On success, the patch is committed and deployed (with optional human approval gate)
Error Detected → Decision Engine → Agent Fix → Tests → Deploy
↓
Notify / Barge-InConfigure the Orchestrator
The Orchestrator class from @codmir/overseer wires all components together:
import { createOrchestrator } from '@codmir/overseer';
const orchestrator = createOrchestrator({
overseer: {
projectId: 'my-project',
organizationId: 'my-org',
apiKey: process.env.CODMIR_API_KEY!,
},
defaultRepoUrl: 'https://github.com/my-org/my-app',
defaultBranch: 'main',
eventWatcher: {
sources: ['overseer', 'github', 'vercel'],
},
});
await orchestrator.start();Once started, the orchestrator:
- Watches for error spikes, deploy failures, and ticket events
- Processes the ticket queue every 30 seconds
- Sends heartbeats every 10 seconds
Decision Engine
The DecisionEngine evaluates each event and decides what to do:
| Decision | Behavior |
|---|---|
execute_silent | Fix the issue autonomously, notify when done |
barge_in | Activate voice mode to alert the user immediately |
notify | Send a notification without taking action |
queue | Add to the task queue for later processing |
ignore | Skip the event |
Critical incidents (service down, error spike) trigger barge_in. Routine tickets trigger execute_silent.
Approval Gates
For dangerous operations, the orchestrator requests approval before executing:
import { createOrchestrator } from '@codmir/overseer';
const orchestrator = createOrchestrator({
overseer: {
projectId: 'my-project',
organizationId: 'my-org',
apiKey: process.env.CODMIR_API_KEY!,
approvalRequired: true,
},
defaultRepoUrl: 'https://github.com/my-org/my-app',
defaultBranch: 'main',
});When approvalRequired is true, the orchestrator emits an approval_requested event and waits for confirmation before deploying fixes. This is the recommended setting for production environments.
Viewing Healing Events
Check What Overseer Is Working On
// For voice mode: "What are you working on?"
const summary = orchestrator.getStatusSummary();
console.log(summary);
// For dashboards: full state
const state = orchestrator.getState();
console.log(state);Check If a Ticket Is Already Being Handled
const isHandled = orchestrator.isWorkingOn('TICKET-123');
if (isHandled) {
console.log('Overseer is already on it');
}Incident Pipeline (Barge-In)
For critical incidents, Overseer can barge in via voice:
// Register a connected client for voice notifications
const unregister = orchestrator.registerClient({
userId: 'user-123',
clientId: 'desktop-abc',
type: 'desktop',
capabilities: ['voice', 'notifications'],
});
// Set notification sink for non-voice notifications
orchestrator.setNotificationSink({
async send(userId, title, body, metadata) {
// Push to inbox, email, mobile, etc.
},
});When a critical incident occurs (e.g., error_spike, service_down), the IncidentPipeline broadcasts to all online clients. The highest-priority client (desktop > web in voice > web > mobile) gets the voice session.
Rollback Behavior
If a self-healing patch fails tests or causes a regression after deployment:
- The fix is automatically reverted
- The incident is escalated to the next notification tier
- The event is logged with
outcome: 'failure'in the timeline
The escalation chain follows: inbox, email, mobile push, phone call, desktop TTS barge-in.
Webhooks
Process incoming webhooks from GitHub, Vercel, or other providers:
// Process a GitHub webhook
await orchestrator.processWebhook('github', {
action: 'completed',
workflow_run: { conclusion: 'failure' },
repository: { full_name: 'my-org/my-app' },
});Stopping the Orchestrator
orchestrator.stop();This cancels all in-flight tasks, stops the event watcher, and clears all intervals.
Next Steps
- Error Tracking -- set up the error capture that feeds self-healing
- Cortex Protocol -- give agents full project context for better fixes
- Swarm Execution -- scale fixes across multiple agents