About me

I am Jun Rao, a third-year Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), co-advised by Min Zhang and Xuebo Liu. I published several papers at top conferences and international journals (ACM/IEEE Trans), including ACL, EMNLP, SIGIR, MM, TMM, CIKM, and IPM. I have a strong preference for methods that are simple, intuitve and fun. My long-term research goal is to build socially intelligent embodied agents with the ability to perceive and engage in multimodal human communication. As steps towards this goal, my research focuses on

1) the fundamentals of multimodal learning, specifically the representation, translation, fusion, and alignment of heterogeneous data sources,

2) human-centered language, vision, and their applications,

3) the real-world deployment of efficiency involves both the amount of computation and data required for (pre-, post-)training and using LLMs.

Publications

APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
Jun Rao, Zepeng Lin, Xuebo Liu, Lian Lian, Dong Jin, Shengjun Cheng, Jun Yu, and Min Zhang
Findings of ACL 2025
CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions
Jun Rao, Xuebo Liu, Lian Lian, Cheng Shengjun, Yunjie Liao, and Min Zhang
EMNLP 2024
What is the limitation of multimodal llms? a deeper look into multimodal llms through prompt probing
Shuhan Qi, Zhengying Cao, Jun Rao*, Lei Wang, Jing Xiao, Xuan Wang
IPM 2023, Corresponding Author
Parameter-efficient and student-friendly knowledge distillation
Jun Rao, Xv Meng, Liang Ding, Shuhan Qi, Xuebo Liu, Min Zhang, Dacheng Tao
TMM 2023
Dynamic contrastive distillation for image-text retrieval
Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, Dacheng Tao
TMM 2023
Where Does the Performance Improvement Come From? -A Reproducibility Concern about Image-Text Retrieval
Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, Dacheng Tao
SIGIR 2022
Student can also be a good teacher: Extracting knowledge from vision-and-language model for cross-modal retrieval
Jun Rao, Tao Qian, Shuhan Qi, Yulin Wu, Qing Liao, Xuan Wang
CIKM 2021

Topics of Interest

Training strategy
- CommonIT [EMNLP24, CCF B]
- Curriculum consistency learning for conditional sentence generation. EMNLP 2024.
Data synthesis
- APT [ACL 25, CCF A]
- AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs. EMNLP 2025.
Reasoning
- REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models. 2025.
- Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning. 2025.
Knowledge Distillation
- PESF-KD [TMM 23, JCR Q1]
- DCD [TMM 23, JCR Q1]
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models. 2024.
Evaluation
- Where Does the Performance Improvement Come From? -A Reproducibility Concern about Image-Text Retrieval. [SIGIR 22, CCF A]
- What is the limitation of multimodal llms? a deeper look into multimodal llms through prompt probing. [IPM 23, JCR Q1]
- Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? [ToMM 23, JCR Q1]
- 3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset. [COLING, CCF B]
- MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models. [ACL 25, CCF A]

Intership Experience

2025.01-2025.12: NLP Research Intern
- Huawei Noah’s Ark Lab, China
- Main Work: Focus on LLMs and Math Reasoning.
- Supervisor: Dr. Xiaojun Meng
2024.01-2024.12: NLP Research Intern
- Huawei Cloud Computing Technologies Co., Ltd., China
- Main Work: Focus on Specialist LLMs.
- Supervisor: Dr. Lian Lian
2021.11-2022.04: NLP Research Intern
- JD EXPLORE ACADEMY, China
- Main Work: Focus on Multimodal Models.
- Supervisor: Dr. Liang Ding

Personal information

I like playing table tennis. I have won the eighth place in the team of Guangdong Provincial Games for college students, the fourth place in the mixed doubles of Shenzhen District Competition, the fifth place in the Fifth Shenzhen Cup Amateur Open Competition, the 16th place in the singles and team of Shenzhen Municipal Games for the mass group, and the second place in the team of Huawei Table Tennis 2024.