Welcome!

I’m Minseo Choi, a sophomore studying Computer Science and Computer Engineering at Johns Hopkins University.

I’m deeply interested in ML Systems, especially the intersection of machine learning, compilers, and GPU performance engineering. Rather than focusing only on model accuracy, I care about how models run in practice — how inference can be faster, cheaper, and more scalable through better systems design.

What I’m currently working on

My recent work mostly lives in the low-level side of ML systems:

  • LLM Serving & Inference Infrastructure Building an LLM-based log analysis agent using frameworks like Ray Serve and vLLM, while studying batching, memory management, and latency–throughput trade-offs in real-world serving systems.

  • GPU Kernel & Performance Engineering Implementing core algorithms such as FlashAttention from scratch using CUDA and Triton, to understand memory access patterns, kernel fusion, and performance bottlenecks at a deeper level.

  • ML Compilers & IR Design Working through LLVM’s Kaleidoscope tutorial and experimenting with toy compiler pipelines — from parsing and ASTs to IR generation and JIT execution — as a foundation for future ML compiler projects.

  • Applied ML Optimization Previously participated in ML competitions and hackathons, where I focused heavily on data imbalance, optimization strategies, and practical model deployment under constraints.

What I’m aiming for

Long-term, I want to work on:

  • ML inference engines,
  • GPU-accelerated systems,
  • and compiler-driven optimizations that make modern ML models practical at scale.

I’m still early in this journey, but I enjoy being in that phase where there’s always more to learn — and more things to break and fix along the way.

Contact

Feel free to reach out via LinkedIn or check my work on GitHub.

Resume