From Silicon to Softmax
A structured curriculum from bare-metal systems programming to distributed GPU clusters — the path from writing apps to making $100B data centers work.
4 modules · 1 lesson
Module 1: The Low-Level Foundation
Systems programming in Rust, CPU architecture, SIMD, memory hierarchy, and Linux performance profiling.
Module 2: GPU & Parallelism
CUDA programming, Triton kernels, parallel algorithms, kernel fusion, and FlashAttention.
Coming soon
Module 3: Distributed Systems
RDMA, InfiniBand, NCCL, distributed training with DDP/FSDP, and the 3D parallelism grid.
Coming soon
Module 4: ML Internals & Optimization
Quantization, inference optimization, and the Rust GPU frontier.
Coming soon