Scale to the moment.
Then back to zero.
Capacity arrives when traffic does — usually in 600 ms or less — and quietly retires when things go quiet. You don't keep boxes warm for visitors who never showed up.
Three checkpoints, one decision.
Every incoming request is metered. Concurrency, queue depth, and p95 latency feed a single signal — the scheduler reacts in milliseconds, not minutes.
Watch the queue
Each region samples concurrent requests, in-flight builds, and pending jobs every 200 ms.
Spin or sleep
Above the threshold, a fresh microVM is provisioned from a warm pool — below it, idle instances are reclaimed.
Cut traffic over
Health-check passes; the edge router shifts new connections to the new instance, then drains the old.
Capacity that follows traffic.
The defaults are good defaults.
Sub-second cold starts
microVM snapshots cached at the edge — p50 boot is 380 ms for a Node 20 app.
Scale-to-zero
No traffic for 5 minutes? Instance retires. First request after pulls a snapshot, not a fresh boot.
Concurrency-aware
Scaling is keyed off in-flight requests, not CPU. A slow downstream won't surprise you with idle boxes.
Region-aware
Burst traffic from APAC scales APAC. You aren't paying for capacity in regions you don't need.
Manual override
Pin a minimum instance count for that one endpoint your customers rely on. Set it in the dashboard or YAML.
Per-app metrics
Live charts of cold-start times, concurrency, and scale events. Drill in when something looks off.
Ship your first app today.
Closed beta. Onboarding a few builders each week — most projects are running within an hour of joining.