Autoscaling

Scale to the moment.
Then back to zero.

Capacity arrives when traffic does — usually in 600 ms or less — and quietly retires when things go quiet. You don't keep boxes warm for visitors who never showed up.

Request access → How autoscaling works

How it works

Three checkpoints, one decision.

Every incoming request is metered. Concurrency, queue depth, and p95 latency feed a single signal — the scheduler reacts in milliseconds, not minutes.

01 / Signal

Watch the queue

Each region samples concurrent requests, in-flight builds, and pending jobs every 200 ms.

02 / Decide

Spin or sleep

Above the threshold, a fresh microVM is provisioned from a warm pool — below it, idle instances are reclaimed.

03 / Route

Cut traffic over

Health-check passes; the edge router shifts new connections to the new instance, then drains the old.

Traffic chart

Capacity that follows traffic.

00:0006:0012:0018:0024:00

Details

The defaults are good defaults.

Sub-second cold starts

microVM snapshots cached at the edge — p50 boot is 380 ms for a Node 20 app.

Scale-to-zero

No traffic for 5 minutes? Instance retires. First request after pulls a snapshot, not a fresh boot.

Concurrency-aware

Scaling is keyed off in-flight requests, not CPU. A slow downstream won't surprise you with idle boxes.

Region-aware

Burst traffic from APAC scales APAC. You aren't paying for capacity in regions you don't need.

Manual override

Pin a minimum instance count for that one endpoint your customers rely on. Set it in the dashboard or YAML.

Per-app metrics

Live charts of cold-start times, concurrency, and scale events. Drill in when something looks off.

Ship your first app today.

Closed beta. Onboarding a few builders each week — most projects are running within an hour of joining.

Request beta access → Read the docs

Scale to the moment.Then back to zero.