I'm a junior at Stanford University pursuing a BS in CS (AI) and a MS in CS (systems). I'm interested in the intersection of machine learning and systems. My recent work has included hardware-aware model optimization at NVIDIA and Stanford AI Lab, and this summer, I'll be joining OpenAI. I'm very thankful to the amazing mentors I've had throughout the way, including Dan Fu, Ethan He, and Jan-Philipp Fränken.
Introduces hardware-aware dynamic sparsity patterns and optimized attention + GEMM CUDA kernels to selectively recompute rapidly-changing activations, accelerating diffusion transformers by up to 3.7x without retraining.