I'm a junior at Stanford University pursuing a BS in CS (AI) and a MS in CS (systems). I'm interested in the intersection of machine learning and systems. My recent work has included hardware-aware model optimization at NVIDIA and Stanford AI Lab, and this summer, I'll be joining OpenAI's infrastructure team. I'm very thankful to the amazing mentors I've had throughout the way, including Dan Fu, Ethan He, and Jan-Philipp Fränken.
Introduces hardware-aware dynamic sparsity patterns and optimized attention + GEMM CUDA kernels to selectively recompute rapidly-changing activations, accelerating diffusion transformers by up to 3.7x without retraining.
Reverse engineered the matrix core register layouts on AMD Instinct GPUs, and implemented core primitives from the ThunderKittens framework. Results in a 10-line functional GEMM. In progress & very experimental!