Autoscaling

Scale to the moment.
Then back to zero.

Capacity arrives when traffic does — usually in 600 ms or less — and quietly retires when things go quiet. You don't keep boxes warm for visitors who never showed up.

How it works

Three checkpoints, one decision.

Every incoming request is metered. Concurrency, queue depth, and p95 latency feed a single signal — the scheduler reacts in milliseconds, not minutes.

01 / Signal

Watch the queue

Each region samples concurrent requests, in-flight builds, and pending jobs every 200 ms.

02 / Decide

Spin or sleep

Above the threshold, a fresh microVM is provisioned from a warm pool — below it, idle instances are reclaimed.

03 / Route

Cut traffic over

Health-check passes; the edge router shifts new connections to the new instance, then drains the old.

Traffic chart

Capacity that follows traffic.

00:0006:0012:0018:0024:00
Details

The defaults are good defaults.

Sub-second cold starts

microVM snapshots cached at the edge — p50 boot is 380 ms for a Node 20 app.

Scale-to-zero

No traffic for 5 minutes? Instance retires. First request after pulls a snapshot, not a fresh boot.

Concurrency-aware

Scaling is keyed off in-flight requests, not CPU. A slow downstream won't surprise you with idle boxes.

Region-aware

Burst traffic from APAC scales APAC. You aren't paying for capacity in regions you don't need.

Manual override

Pin a minimum instance count for that one endpoint your customers rely on. Set it in the dashboard or YAML.

Per-app metrics

Live charts of cold-start times, concurrency, and scale events. Drill in when something looks off.

Ship your first app today.

Closed beta. Onboarding a few builders each week — most projects are running within an hour of joining.