Instances
An instance is one or more GPUs of a single SKU in a region, launched from an image.
Lifecycle
States progress pending → provisioning → running → stopping → stopped → terminated.
You can start, stop, reboot, resize, and terminate a running instance. State
changes are driven by the Control Plane and surfaced over webhooks with a polling
fallback.
Pricing tiers
- On-demand — per-GPU-hour, launch and terminate at will.
- Reserved — 1/6/12-month commit, discounted.
- Spot — interruptible, cheapest; expect a short preemption notice.
Telemetry
Each running instance streams GPU utilization, VRAM, temperature, power draw, NVLink throughput, and tokens/sec. Set alert thresholds to get notified on anomalies.
Spend caps
Set a per-org spend cap with a soft alert and a hard stop. When the cap is reached, new launches are paused automatically.