GPU compute, on tap.
On-demand, reserved, and spot access to NVIDIA GPU compute. On-demand, reserved, or spot — billed per GPU-second with the region multiplier applied at metering time.
Instances
Launch a GPU in a region from a template. Start, stop, resize, terminate. Cost-so-far meter.
Clusters
Multi-node NVLink / InfiniBand domains with shared volumes and chosen topology.
Managed endpoints
Deploy a vLLM or TensorRT-LLM endpoint to an HTTPS URL. Autoscale, scale-to-zero.
Storage
Persistent block volumes — attach, detach, snapshot — and S3-compatible buckets.
Networking
Private VPC, public IPs, firewall rules, and region peering.
Monitoring
GPU util, VRAM, temp, power, NVLink throughput, tokens/sec. Alert thresholds.
H100 SXM
The proven Hopper workhorse for inference and fine-tuning.
- MEM
- 80 GB HBM3
- BW
- 3.35 TB/s
- NVLINK
- Gen 4
H200 SXM
141 GB HBM3e for long-context inference without the spill.
- MEM
- 141 GB HBM3e
- BW
- 4.8 TB/s
- NVLINK
- Gen 4
B200
Blackwell training and inference. Supply-constrained — join the waitlist.
- MEM
- 192 GB HBM3e
- BW
- 8 TB/s
- NVLINK
- Gen 5
B300
288 GB HBM3e. 15 PFLOPS dense FP4. Built for agentic AI.
- MEM
- 288 GB HBM3e
- BW
- 8 TB/s
- NVLINK
- Gen 5
Launch a GPU in minutes.
On-demand, reserved, or spot — billed per GPU-second with the region multiplier applied at metering time.