About me

I am a senior research engineer with Ant Group, where I currently work on a variety of LLM alignment and RL.

Before this, I worked in IBM Research AI as a research intern and collaborated with Pin-Yu Chen, Payel Das, Songtao Lu, Xiaodong Cui and many other talented researchers at IBM. My research in IBM focused on LLM alignment and offline RL. Meanwhile, I did my Ph.D. under the supervision of Dr. Tianyi Chen (now in Cornell Tech) in RPI. I was fortunate to join Dr. Tianyi Chen’s group as the first Ph.D. student. My Ph.D. research focused on optimization and reinforcement learning.

News and highlights

[Sep. 2025] New paper on LLM-RL: AEnt. Its asynchronous implementation is incorporated in the highly scalable RL framework AReaL.
- On Entropy Control in LLM-RL Algorithms [code]
[Mar. 2025] I am excited to join Ant Group via its research talent program Ant Star.
[Feb. 2025] Our extended study of the ICML 2024 paper is dual accepted in JMLR.
- ArXiv: Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
[Jan. 2025] Our paper is accepted in ICLR 2025:
- SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
[Dec. 2024] Our extended study of the ICML 2023 paper has been dual accepted in Mathematical Programming.
- On Penalty-based Bilevel Gradient Descent Method
[Oct. 2024] New paper on improved LLM alignment framwork:
- Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Selected works

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen
ICLR 2025. [arxiv]
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Heshan Fernando*, Han Shen*, Parikshit Ram, Yi Zhou, Horst Samulowitz, Nathalie Baracaldo, Tianyi Chen
*equal contribution, preprint. [arxiv]
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Han Shen, Zhuoran Yang, Tianyi Chen
ICML 2024, extended work in JMLR [arxiv]
Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Approach
Heshan D. Fernando, Han Shen, Miao Liu, Subhajit Chaudhury, Keerthiram Murugesan, Tianyi Chen
ICLR 2023 oral. [arxiv]
On Penalty-based Bilevel Gradient Descent Method
Han Shen, Quan Xiao, Tianyi Chen
ICML 2023, extended work in Mathematical Programming. [MAPR]

Industry experiences

Ant Group. (CN) Present

Senior research engineer, joined via Ant Star talent program.

IBM Research AI. (US) 05.2024 - 08.2024

Research intern, mentored by Dr. Pin-Yu Chen and managed by Dr. Payel Das.

IBM Research AI. (US) 05.2021 - 08.2021

Research intern, mentored by Dr. Songtao Lu and Dr. Xiaodong Cui.

Services

Reviewer for NeurIPS, ICML, ICLR, AISTATS, AAAI and IEEE Transactions on Signal Processing (TSP).

Han Shen

News and highlights

Selected works

Industry experiences

Services