The AI Infrastructure Stack
Overview  /  Tier II Compute as a Service
Layer 04

AI Cloud & Inference

Where GPUs are rented by the hour and tokens are sold by the million.

What this layer does

This is the financial pivot point of the AI economy. Application revenue and model-API revenue land here as compute spend, and from here the money fans out into hardware, real estate, and power. Three hundred-plus billion dollars of annual hyperscaler capex flows through this layer.

The layer splits into training compute (large reserved clusters sold to a handful of labs) and inference compute (token-priced APIs and dedicated endpoints). Training is concentrated in a few buyers and a few clouds. Inference is fragmenting fast — specialists like Groq and Cerebras compete on tokens-per-second, while neoclouds undercut hyperscalers on price for less differentiated workloads.

Sub-categories

Analysis coming soon — will cover: hyperscaler capex unit economics (rev/$capex by cohort), neocloud bear case (depreciation vs. GPU resale value, customer concentration on a single anchor), inference vs. training margin gap, and why some bitcoin miners successfully pivoted while others didn’t.