Kevin Galim
Senior AI Research Engineer · FuriosaAI · Seoul, South Korea
FuriosaAI
Seoul, South Korea
I am a Senior AI Research Engineer at FuriosaAI, where I work on efficient LLM inference, post-training systems, and accelerator-aware training/inference pipelines. My research sits at the intersection of efficient inference, post-training, and AI systems.
I have authored 10+ publications at venues including ICLR, ICML, ACL, CVPR, ECCV, and WACV, with work spanning KV-cache and prompt/context compression, parameter-efficient adaptation for state space models, and diffusion LLMs with parallel decoding.
Before joining FuriosaAI, I worked on applied computer vision at Funzin, including autonomous golf cart perception and CES 2021 demos; GPU-accelerated image processing at ARRI in Munich; and freelance AR/web development. I received my M.Sc. in Informatics (Games Engineering) from the Technical University of Munich (grade 1.4), including a semester of research in computer graphics at the University of Tokyo.
Research interests:
- Efficient LLM inference: KV-cache and prompt/context compression, speculative/draft-based decoding, and approximate inference
- Parameter-efficient fine-tuning: LoRA, state space models, and Mamba-style architectures
- Diffusion LLMs, parallel decoding, and generative systems
- Post-training systems: on-policy distillation, asynchronous RL/OPD pipelines, stale rollout correction, teacher-cache constraints, and throughput–quality trade-offs
- Accelerator-aware LLM systems: rollout generation, inference pipelines, and custom hardware integration
Ongoing work:
- AsyncOPD: How Stale Can On-Policy Distillation Be?. Studies stale rollouts, KL-direction sensitivity, teacher-cache constraints, estimator design, and throughput–quality trade-offs in asynchronous OPD pipelines.
Languages: German (native) · English (fluent) · Korean (professional, TOPIK 5)