Minbyul Jeong

Publications

Selected / All

Agents Training Data Factuality & Hallucination
Searching after Answering: Diagnosing and Repairing Over-Reflection in Search Agents
Minbyul Jeong,
Preprint.

project page / code

Search agents find the correct answer, verify it through search, and then keep searching for many more rounds before reporting it. We name this over-reflection and show it is structural rather than incidental: 62.8% of verifiable trajectories continue past confirmation while the standard leakage detector flags only 21.1%. Four data-level repairs of one corpus converge to identical stopping behavior — editing the trajectories a policy imitates does not edit its stop decision.

Question Answering Benchmark Datasets Agents Factuality & Hallucination
OpenBioRQ: Unsolved Biomedical Research Questions for Agents
Minbyul Jeong,
Preprint.

arXiv / code / project page / dataset

We present OpenBioRQ, an agentic benchmark of 12,553 genuinely unsolved biomedical research questions across 12 domains. Without fixed answer keys, it scores agents on retrieval-grounded tool use and citation faithfulness, exposing wrong-paper citations (15.9%) and "agentic collapse" that closed-form medical QA cannot reveal.

Benchmark Datasets Agents Question Answering
Ko-WideSearch: A Korean Breadth-Search Benchmark for Web Agents
Minbyul Jeong,
Preprint.

project page / code / dataset

Web-agent benchmarks mostly measure depth — pinning one obscure answer behind a chain of constraints. Ko-WideSearch measures breadth: enumerate every member of a closed set and fill each item's attributes. Agents recover set membership well (92.8% Item-F1) but fail to complete rows (53.7% Row-F1).

Agents Reinforcement Learning Question Answering
Healthcare AI GYM for Medical Agents
Minbyul Jeong,
Preprint.

arXiv / code / project page

We present a gymnasium-compatible environment with 10 clinical domains and 3.6K+ tasks for training medical agents via multi-turn reinforcement learning. We introduce Turn-level Truncated On-Policy Distillation (TT-OPD) to stabilize training and improve multi-turn clinical reasoning.

Chronological Knowledge Question Answering Interpretability
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information
Yein Park, Chanwoong Yoon, Jungwoo Park, Minbyul Jeong, Jaewoo Kang,
Preprint.

arXiv / code

We discover Temporal Heads, specific attention heads primarily responsible for processing temporal knowledge through circuit analysis.

Question Answering System Generation Steering LLM's behavior
System Message Generation for User Preferences using Open-Source Models
Minbyul Jeong, Jungho Cho, Minsoo Khang, Dawoon Jung, Teakgyu Hong,
Preprint.

arXiv

We present SysGen, a pipeline for generating system messages with better aligned assistant responses from the supervised fine-tuning dataset without system messages.

Chronological Knowledge Question Answering
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park, Chanwoong Yoon, Jungwoo Park, Donghyeon Lee, Minbyul Jeong, Jaewoo Kang,
ICLR 2025.

open review / code

We present CHROKNOWLEDGE (Chronological Categorization of Knowledge), a novel sampling-based framework for evaluating LLMs’ non-parametric chronological knowledge.

Question Answering Benchmark Datasets Factuality & Hallucination
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Minbyul Jeong, Hyeon Hwang, Chanwoong Yoon, Taewhoo Lee, Jaewoo Kang,
Preprint.

arXiv / code / Youtube

We present MedLFQA, a benchmark dataset reconstructed using long-form question-answering. We also introduce OLAPH, a framework leverages automatic evaluation to generate synthetic preference sets that can help align the model with preferred responses.

Question Answering Retrieval Augmented generation Instruction-tuned LLM
Self-BioRAG: Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models
Minbyul Jeong, Jiwoong Sohn, Mujeen Sung, Jaewoo Kang,
ISMB 2024.

arXiv / code

We present Self-BioRAG, a domain-specific LLM version of Self-RAG framework.

Named Entity Recognition Consistency
ConNER: Consistency enhancement of model prediction on document-level named entity recognition
Minbyul Jeong, Jaewoo Kang,
Bioinformatics 2023.

paper / code

We present ConNER, a biomedical NER training framework to enhance label consistency in document-level context.

Named Entity Recognition Named Entity Normalization
BERN2: an advanced neural biomedical named entity recognition and normalization tool
Mujeen Sung*, Minbyul Jeong*, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee, Jaewoo Kang,
Bioinformatics Appnote 2022.

demo / paper / code

We present BERN2, a biomedical NER and NEN framework to automatically extract biomedical entities in biomedical literature. It only spent 0.3sec per document.

Graph-based Learning
Graph Transformer Networks: Learning Meta-path Graphs to Improve GNNs
Seongjun Yun, Minbyul Jeong, Sungdong Yoo, Seunghun Lee, Sean S. Yi, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim,
Neural Networks 2022.

paper / code

We present FastGTN, a network improve scalability of graph transformations from previous version of Graph Transformer Networks (GTN).

Graph-based Learning
Graph Transformer Networks
Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim,
NeurIPS 2019.

paper / code

We present GTN, a network for graph transformations to enhance node representations.

Full list from Google Scholar. Bold denotes my name; * equal contribution, † corresponding author.

2026

Searching after Answering: Diagnosing and Repairing Over-Reflection in Search Agents. Minbyul Jeong. Preprint, 2026. [project] [code]
Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents. Minbyul Jeong. arXiv preprint, 2026. [project] [code] [dataset]
OpenBioRQ: Unsolved Biomedical Research Questions for Agents. Minbyul Jeong. arXiv preprint, 2026. [arXiv] [code] [project] [dataset]
Healthcare AI GYM for Medical Agents. Minbyul Jeong. arXiv preprint, 2026. [arXiv] [code] [project]
Solar Open Technical Report. Solar AI Tech Team. arXiv preprint, 2026. [arXiv] [model]
User-Oriented Multi-Turn Dialogue Generation with Tool Use at Scale. Jungho Cho, Minbyul Jeong, Sungrae Park. arXiv preprint, 2026. [arXiv]

2025

Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information. Yein Park, Chanwoong Yoon, Jungwoo Park, Minbyul Jeong†, Jaewoo Kang†. ACL 2025. [arXiv] [code]
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains. Yein Park, Chanwoong Yoon, Jungwoo Park, Donghyeon Lee, Minbyul Jeong†, Jaewoo Kang†. ICLR 2025. [arXiv] [OpenReview] [code]
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training. Yein Park, Minbyul Jeong†, Jaewoo Kang†. arXiv preprint, 2025. [arXiv]
Trustworthy Agents for Electronic Health Records through Confidence Estimation. Yongwoo Song, Minbyul Jeong†, Mujeen Sung†. arXiv preprint, 2025. [arXiv]
System Message Generation for User Preferences using Open-Source Models. Minbyul Jeong†, Jungho Cho, Minsoo Khang, Dawoon Jung, Teakgyu Hong. arXiv preprint, 2025. [arXiv]

2024

CompAct: Compressing Retrieved Documents Actively for Question Answering. Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong†, Jaewoo Kang†. EMNLP 2024. [arXiv] [code]
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models. Minbyul Jeong, Jiwoong Sohn, Mujeen Sung, Jaewoo Kang. ISMB 2024 (Bioinformatics). [arXiv] [code]
OLAPH: Improving Factuality in Biomedical Long-form Question Answering. Minbyul Jeong, Hyeon Hwang, Chanwoong Yoon, Taewhoo Lee, Jaewoo Kang. arXiv preprint, 2024. [arXiv] [code] [video]

2023

Consistency Enhancement of Model Prediction on Document-level Named Entity Recognition. Minbyul Jeong, Jaewoo Kang. Bioinformatics 2023. [paper] [code]

2022

BERN2: An Advanced Neural Biomedical Named Entity Recognition and Normalization Tool. Mujeen Sung*, Minbyul Jeong*, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee, Jaewoo Kang. Bioinformatics 2022. [paper] [demo] [code]
Graph Transformer Networks: Learning Meta-path Graphs to Improve GNNs. Seongjun Yun, Minbyul Jeong, Sungdong Yoo, Seunghun Lee, Sean S. Yi, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim. Neural Networks 2022. [paper] [code]
Data-Centric and Model-Centric Approaches for Biomedical Question Answering. Wonjin Yoon, Jaehyo Yoo, Sumin Seo, Mujeen Sung, Minbyul Jeong, Gangwoo Kim, Jaewoo Kang. CLEF 2022. [paper]
Pandemics are Catalysts of Scientific Novelty: Evidence from COVID-19. Meijun Liu, Yi Bu, Chongyan Chen, …, Minbyul Jeong, …, Ying Ding. JASIST 2022. [paper] [arXiv]

2021

Regularization for Long Named Entity Recognition. Minbyul Jeong, Jaewoo Kang. arXiv preprint, 2021. [arXiv]

2020

Transferability of Natural Language Inference to Biomedical Question Answering. Minbyul Jeong, Mujeen Sung, Gangwoo Kim, Donghyeon Kim, Wonjin Yoon, Jaehyo Yoo, Jaewoo Kang. CLEF 2020. [arXiv]
Answering Questions on COVID-19 in Real-Time. Jinhyuk Lee, Sean S. Yi, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, Jaewoo Kang. NLP-COVID Workshop @ EMNLP 2020. [arXiv] [code]
Building a PubMed Knowledge Graph. Jian Xu, Sunkyu Kim, Min Song, Minbyul Jeong, …, Ying Ding. Scientific Data 2020. [paper]

2019

Graph Transformer Networks. Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim. NeurIPS 2019. [paper] [code]
Pre-trained Language Model for Biomedical Question Answering. Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang. ECML PKDD 2019. [paper]
BERN: A Neural Named Entity Recognition and Multi-type Normalization Tool for Biomedical Text Mining. Donghyeon Kim, Jinhyuk Lee, Chan Ho So, Hwisang Jeon, Minbyul Jeong, Yonghwa Choi, Wonjin Yoon, Mujeen Sung, Jaewoo Kang. IEEE Access 2019. [paper] [code]
HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction. Raehyun Kim, Chan Ho So, Minbyul Jeong, Sanghoon Lee, Jinkyu Kim, Jaewoo Kang. arXiv preprint, 2019. [arXiv]

Research Interests

News

Publications

Other activities