Networking
Moving bits between GPUs, between racks, and between data centers.
What this layer does
A 100,000-GPU training cluster only works if every GPU can talk to every other GPU at hundreds of gigabits per second with microsecond latency. The networking layer makes that physically possible. Three independent fabrics typically coexist: back-end (GPU-to-GPU, the highest bandwidth), front-end (server-to-storage / external), and scale-out / DC interconnect (campus + metro).
This is a remarkably good business to be in right now: optical transceiver shipments are growing 40&%2B annually and each speed transition (400G → 800G → 1.6T) is roughly a doubling of dollar content per port. The interesting structural questions are (1) Ethernet vs. InfiniBand, (2) when co-packaged optics replaces pluggable transceivers, and (3) whether merchant silicon (Broadcom) keeps eating Nvidia’s networking attach.
Sub-categories
The dominant choice outside Nvidia-anchored training clusters. Driven by the Ultra Ethernet Consortium and Broadcom Tomahawk / Jericho silicon.
Nvidia’s proprietary high-performance fabric (via Mellanox). Still dominant in the largest training runs.
The card on each server that handles network offload, RDMA, security, and storage.
400G, 800G, and 1.6T pluggable optics. The largest dollar-content category in the networking layer.
The Foxconn of optics — builds modules for Coherent, Cisco, etc.
The signal-processing silicon inside every transceiver. The hidden moat in the optical supply chain.
PCIe / CXL / CXL-over-Ethernet retimers. Astera Labs is the breakout name.
Short-reach (~3m) copper alternative to optics inside a rack. Credo’s AEC business.
Putting the optics directly on the switch ASIC or accelerator package. Emerging; structural threat/opportunity for transceivers.
Mile after mile of single-mode fiber inside hyperscale campuses.
Connecting data centers across cities and oceans. Driven by training cluster geo-distribution.