AI Cloud & Inference
Where GPUs are rented by the hour and tokens are sold by the million.
What this layer does
This is the financial pivot point of the AI economy. Application revenue and model-API revenue land here as compute spend, and from here the money fans out into hardware, real estate, and power. Three hundred-plus billion dollars of annual hyperscaler capex flows through this layer.
The layer splits into training compute (large reserved clusters sold to a handful of labs) and inference compute (token-priced APIs and dedicated endpoints). Training is concentrated in a few buyers and a few clouds. Inference is fragmenting fast — specialists like Groq and Cerebras compete on tokens-per-second, while neoclouds undercut hyperscalers on price for less differentiated workloads.
Sub-categories
The big three plus Oracle. Where most enterprise AI deployments end up because the data is already there.
GPU-focused clouds built ground-up for AI workloads. The 2023-2026 capex story.
Custom-silicon or highly-optimized inference clouds competing on speed and cost-per-token.
Lower-cost or geographically-specific GPU rental, often via Hopper/Blackwell on a price-per-hour basis.
National-champion clouds. Politically driven capex, often state-backed.
Sitting on enterprise data — the obvious place to attach inference. Bundle: data warehouse + model serving + governance.
Spot-market layer over the neoclouds and decentralized capacity.