Hi! I am Minbyul Jeong, a Ph.D. at Korea University under the supervision of Professor Jaewoo Kang.
I'm always passionate about solving real-world problems.
My primary goal is to enable Aritifical Intelligence to help people around the world lead better lives.
Ultimately, I aim to develop personalized healthcare AI systems that can diagnose and cure individual diseases, empowering everyone to manage their own health effectively.
Toward this goal, I currently build search and tool-using AI agents that retrieve evidence, reason over multiple turns, and act reliably on real-world tasks.
Email  /  CV  /  Google Scholar  /  LinkedIn  /  X  /  Github  /  Youtube
We present OpenBioRQ, an agentic benchmark of 12,553 genuinely unsolved biomedical research questions across 12 domains. Without fixed answer keys, it scores agents on retrieval-grounded tool use and citation faithfulness, exposing wrong-paper citations (15.9%) and "agentic collapse" that closed-form medical QA cannot reveal.
Web-agent benchmarks mostly measure depth — pinning one obscure answer behind a chain of constraints. Ko-WideSearch measures breadth: enumerate every member of a closed set and fill each item's attributes. Agents recover set membership well (92.8% Item-F1) but fail to complete rows (53.7% Row-F1).
We present a gymnasium-compatible environment with 10 clinical domains and 3.6K+ tasks for training medical agents via multi-turn reinforcement learning. We introduce Turn-level Truncated On-Policy Distillation (TT-OPD) to stabilize training and improve multi-turn clinical reasoning.
We discover Temporal Heads, specific attention heads primarily responsible for processing temporal knowledge through circuit analysis.
We present SysGen, a pipeline for generating system messages with better aligned assistant responses from the supervised fine-tuning dataset without system messages.
We present CHROKNOWLEDGE (Chronological Categorization of Knowledge), a novel sampling-based framework for evaluating LLMs’ non-parametric chronological knowledge.
We present MedLFQA, a benchmark dataset reconstructed using long-form question-answering. We also introduce OLAPH, a framework leverages automatic evaluation to generate synthetic preference sets that can help align the model with preferred responses.
We present Self-BioRAG, a domain-specific LLM version of Self-RAG framework.
We present ConNER, a biomedical NER training framework to enhance label consistency in document-level context.
We present BERN2, a biomedical NER and NEN framework to automatically extract biomedical entities in biomedical literature. It only spent 0.3sec per document.
We present FastGTN, a network improve scalability of graph transformations from previous version of Graph Transformer Networks (GTN).
We present GTN, a network for graph transformations to enhance node representations.
Template based on Jon Barron's website.