The AI Infrastructure Stack
Overview  /  Tier III Physical Data Center
Layer 07

Networking

Moving bits between GPUs, between racks, and between data centers.

What this layer does

A 100,000-GPU training cluster only works if every GPU can talk to every other GPU at hundreds of gigabits per second with microsecond latency. The networking layer makes that physically possible. Three independent fabrics typically coexist: back-end (GPU-to-GPU, the highest bandwidth), front-end (server-to-storage / external), and scale-out / DC interconnect (campus + metro).

This is a remarkably good business to be in right now: optical transceiver shipments are growing 40&%2B annually and each speed transition (400G → 800G → 1.6T) is roughly a doubling of dollar content per port. The interesting structural questions are (1) Ethernet vs. InfiniBand, (2) when co-packaged optics replaces pluggable transceivers, and (3) whether merchant silicon (Broadcom) keeps eating Nvidia’s networking attach.

Sub-categories

Analysis coming soon — will cover: Ethernet vs. InfiniBand, the 800G → 1.6T transition and CPO timing, why Arista is over-earning vs. structural growth, optical content per GPU as a tracking ratio.