Haoqi Yuan (袁昊琦)

I am a third-year Ph.D. student at Peking University, advised by Prof. Zongqing Lu. I received my Bachelor's degree in 2021, from the Turing Class at Peking University.

My research interest primarily lies in reinforcement learning (RL), representation learning, and generative modeling. Currently, I am focusing on: (1) pre-training methodologies for RL; (2) integrating foundation models with RL.

Email  /  Google Scholar  /  Github

Papers
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia.
2023
arXiv / project page / bibtex

RL-GPT equips Large Language Models (LLMs) with Reinforcement Learning (RL) tools, empowering LLM agents to solve challenging tasks in complex, open-world environments. It has a hierarchical framework: a slow LLM agent plans subtasks and selects proper tools (RL or code-as-policy); a fast LLM agent instantiates RL training pipelines or generates code to learn subtasks. LLM agents can perform self-improvement via trial-and-error efficiently. RL-GPT shows great efficiency on solving diverse Minecraft tasks, obtaining Diamond at 8% success rate within 3M environment steps.

Pre-Training Goal-Based Models for Sample-Efficient Reinforcement Learning
Haoqi Yuan, Zhancun Mu, Feiyang Xie, Zongqing Lu.
ICLR oral (acceptance rate: 1.2%), 2024
conference paper / bibtex

PTGM pre-trains on task-agnostic datasets to accelerate learning downstream tasks with RL. The pre-trained models provide: 1. a low-level, goal-conditioned policy that can perform diverse short-term behaviors; 2. a discrete high-level action space consisting of clustered goals in the dataset; 3. a goal prior model that guides and stablize downstream RL to train the high-level policy. PTGM can extend to the complicated domain Minecraft with large datasets, showing great sample efficiency, task performance, interpretability, and generalization of the acquired low-level skills.

Creative Agents: Empowering Agents with Imagination for Creative Tasks
Chi Zhang, Penglin Cai, Yuhui Fu, Haoqi Yuan , Zongqing Lu.
2023
arXiv / project page / bibtex

Creative tasks are challenging for open-ended agents, where the agent should give novel and diverse task solutions. We propose creative agents with the ability of imagination and introduce several variants in implementation. We benchmark creative tasks in the challenging open-world game Minecraft and propose novel evaluation metrics utilizing GPT-4V. Creative agents are the first AI agents accomplishing diverse building creation in Minecraft survival mode.

Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks
Haoqi Yuan , Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu.
NeurIPS FMDM Workshop, 2023
workshop paper / arXiv / project page / bibtex

Plan4MC is a multi-task agent in the open world Minecraft, solving long-horizon tasks via planning over basic skills. It acquire three types of fine-grained basic skills through reinforcement learning without demonstrations. With a skill graph pre-generated by the Large Language Model, the skill search algorithm plans and interactively selects policies to solve complicated tasks. Plan4MC accomplishes 40 diverse tasks in Minecraft and unlocks Iron Pickaxe in the Tech Tree.

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning
Haoqi Yuan, Zongqing Lu.
ICML, 2022
conference paper / project page / bibtex

Offline meta-RL is a data-efficient RL paradigm that learns from offline data to adapt to new tasks. We propose a contrastive learning framework for robust task representations in context-based offline meta-RL. Our method improves the adaptation performance on unseen tasks, especially when the context is out-of-distribution.

DMotion: Robotic Visuomotor Control with Unsupervised Forward Model Learned from Videos
Haoqi Yuan, Ruihai Wu, Andrew Zhao, Haipeng Zhang, Zihan Ding, Hao Dong.
IROS, 2021
conference paper / arXiv / project page / bibtex

We study learning world models from action-free videos. Our unsupervised learning method leverages spatial transformers to disentangle the motion of controllable agent, learns a forward model conditioned on the explicit representation of actions. Using a few samples labelled with true actions, our method achieves superior performance on video prediction and model predictive control tasks.

DLGAN: Disentangling Label-Specific Fine-Grained Features for Image Manipulation
Guanqi Zhan, Yihao Zhao, Bingchan Zhao, Haoqi Yuan , Baoquan Chen, Hao Dong.
2020
arXiv / bibtex

The first work to utilize discrete multi-labels to control which features to be disentangled, and enable interpolation between two domains without using continuous labels. An end-to-end method to support image manipulation conditioned on both images and labels, enabling both smooth and immediate changes simultaneously.

Experience
Beijing Academy of Artificial Intelligence (BAAI)
Intern
2023-now
Hyperplane Lab
Intern. Advised by Prof. Hao Dong.
2019-2021
Education
School of Computer Science, Peking University
Ph.D. student
Advised by Prof. Zongqing Lu
2021-now
Turing Class, Peking University
Undergraduate student
2017-2021
Services

Reviewer:
ICML 22,24. NeurIPS 22,23. ICLR 24. AAAI 23, 24, CVPR 24.

Teaching Assistant:
Deep Reinforcement Learning, Zongqing Lu, 2023 Spring
Computational Thinking in Social Science, Xiaoming Li, 2020 Autumn
Deep Generative Models, Hao Dong, 2020 Spring

Awards

2022: NERCVT Outstanding Student of the Year

2022: Peking University President Scholarship

2021: Peking University President Scholarship

2020: John Hopcroft Scholarship

2019: Peking University Turing Class Scholarship

2019: Meritorious Winner in 2019 Mathematical Contest in Modeling

2018: National Scholarship

2018: Peking University Merit Student Pacesetter

2016: Second Class Award in 33th Chinese Physics Olympiad (Finals)