From Silicon to Softmax

A structured curriculum from bare-metal systems programming to distributed GPU clusters — the path from writing apps to making $100B data centers work.

4 modules · 1 lesson

Module 1: The Low-Level Foundation

Systems programming in Rust, CPU architecture, SIMD, memory hierarchy, and Linux performance profiling.

Module 2: GPU & Parallelism

CUDA programming, Triton kernels, parallel algorithms, kernel fusion, and FlashAttention.

Coming soon

Module 3: Distributed Systems

RDMA, InfiniBand, NCCL, distributed training with DDP/FSDP, and the 3D parallelism grid.

Coming soon

Module 4: ML Internals & Optimization

Quantization, inference optimization, and the Rust GPU frontier.

Coming soon