您现在的位置: 首页 » 学院新闻 » 讲座信息 » 正文

学院新闻

讲座信息

计算机学院系列讲座菁英论坛第40期——Structure-driven design of reinforcement learning algorithms: a tale of two estimators

           

报告题目(Title)Structure-driven design of reinforcement learning algorithms: a tale of two estimators

 

时间(Date & Time)2024.12.20; 15:00 周五)

 

地点(Location)燕园大厦813(燕园校区) Room 813, Yanyuan Building #1 (Yanyuan) 

 

主讲人(Speaker)Wenlong Mou牟文龙

 

邀请人(Host)Xuanzhe Liu(刘譞哲)

 

报告摘要(Abstract)

Reinforcement learning (RL) is emerging as a powerful tool for adaptive decision-making in dynamic environments. A key challenge in RL is learning value functions efficiently, which plays a critical role in optimizing decision policies. Over the years, a diverse range of RL algorithms has been proposed, but at their core, two foundational principles stand out: bootstrapping and rollout. Despite their success, finding the optimal trade-off between these principles in practical applications remains elusive, with current theoretical guarantees often falling short of providing actionable insights.

 

In this talk, I will discuss recent advances in methods that optimally reconcile bootstrapping and rollout for policy evaluation. The bulk of this talk will focus on a new class of algorithms that strikes an optimal balance between temporal difference learning and Monte Carlo methods. Through the statistical lens, I will highlight how the local structure of the underlying Markov chain influences the complexity of these problems, and how the new algorithm adapts to these structures. Extending this perspective to continuous-time RL, I will explore how the elliptic structure of diffusion processes provides key insights for making algorithmic choices.

 

主讲人简介(Bio)

 

牟文龙现任多伦多大学统计科学系助理教授。2023年,他于加州大学伯克利分校获得计算机与电子工程学博士学位;2017年毕业于北京大学信息科学技术学院,获得计算机科学学士学位及经济学双学位。他的研究领域集中于机器学习和数据科学中的理论与算法,近期主要关注数据驱动决策问题中的机器学习方法研究。其研究成果已发表于机器学习、统计学、运筹学等领域的顶级期刊和会议,并曾荣获国际运筹学会应用概率最佳学生论文提名。

 

欢迎关注计算机学院微信公众号,了解更多讲座信息!

 

北京大学计算机学院