Kevin Galim

Senior AI Research Engineer · FuriosaAI · Seoul, South Korea

prof_pic.jpg

FuriosaAI

Seoul, South Korea

I am a Senior AI Research Engineer at FuriosaAI, where I work on efficient LLM inference, post-training systems, and accelerator-aware training/inference pipelines. My research sits at the intersection of efficient inference, post-training, and AI systems.

I have authored 10+ publications at venues including ICLR, ICML, ACL, CVPR, ECCV, and WACV, with work spanning KV-cache and prompt/context compression, parameter-efficient adaptation for state space models, and diffusion LLMs with parallel decoding.

Before joining FuriosaAI, I worked on applied computer vision at Funzin, including autonomous golf cart perception and CES 2021 demos; GPU-accelerated image processing at ARRI in Munich; and freelance AR/web development. I received my M.Sc. in Informatics (Games Engineering) from the Technical University of Munich (grade 1.4), including a semester of research in computer graphics at the University of Tokyo.

Research interests:

  • Efficient LLM inference: KV-cache and prompt/context compression, speculative/draft-based decoding, and approximate inference
  • Parameter-efficient fine-tuning: LoRA, state space models, and Mamba-style architectures
  • Diffusion LLMs, parallel decoding, and generative systems
  • Post-training systems: on-policy distillation, asynchronous RL/OPD pipelines, stale rollout correction, teacher-cache constraints, and throughput–quality trade-offs
  • Accelerator-aware LLM systems: rollout generation, inference pipelines, and custom hardware integration

Ongoing work:

  • AsyncOPD: How Stale Can On-Policy Distillation Be?. Studies stale rollouts, KL-direction sensitivity, teacher-cache constraints, estimator design, and throughput–quality trade-offs in asynchronous OPD pipelines.

Languages: German (native) · English (fluent) · Korean (professional, TOPIK 5)

selected publications

  1. ICLR
    Draft-based Approximate Inference for LLMs
    Kevin Galim*, Ethan Ewer*, Wonjun Kang, and 3 more authors
    In International Conference on Learning Representations, 2026
  2. ICLR
    ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
    Wonjun Kang*, Kevin Galim*, Seunghyuk Oh*, and 8 more authors
    In International Conference on Learning Representations, 2026
  3. ICML
    Parameter-Efficient Fine-Tuning of State Space Models
    Kevin Galim*, Wonjun Kang*, Yuchen Zeng*, and 2 more authors
    In International Conference on Machine Learning, 2025