Yupeng Han portrait

GPU Performance Engineer | CUDA, Low-Latency Systems, LLM Inference

Yupeng Han

GPU performance engineer focused on CUDA, low-latency systems, and large-scale compute optimization. Recently expanding into LLM inference systems, with emphasis on transformer serving, KV-cache trade-offs, batching, roofline analysis, and distributed communication.

Experience

Staff Software Engineer
Plus AI
Senior GPU Engineer
EBots
R&D Engineer
Trifo
Research Engineer
CMU Robotics Institute

Featured