Yupeng HanGPU performance engineer focused on CUDA, low-latency systems, and large-scale compute optimization. Recently expanding into LLM inference systems, with emphasis on transformer serving, KV-cache trade-offs, batching, roofline analysis, and distributed communication. Professional Experience
Selected LinksContact |