来源:最新院系讲座

5月9日 | 周帆:Distributional Off-Policy Evaluation with Deep Quantile Process Regression

来源:统计学院发布时间:2025-05-06浏览次数:10

时   间:2025年5月9日(周五)16:00 – 17:00

地   点:理科大楼A1514室

报告人:周帆  上海财经大学副教授

主持人:唐炎林  华东师范大学教授

摘   要:

This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective, with the aim of modeling the entire distribution of total returns, rather than focusing solely on estimating the expectation, as most existing OPE methods do. Specifically, we are the first to explore quantile-based methods for OPE through quantile process regression, introducing a novel algorithm called Quantile Process regression Off-Policy Evaluation (QPOPE). We provide new theoretical insights into the quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the first rigorous sample complexity analysis for distributional reinforcement learning with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that QPOPE achieves statistical efficiency by estimating the full return distribution using the same sample size needed to estimate a single mean value in conventional methods. Additionally, our empirical studies illustrate that QPOPE provides significantly more precise and robust mean value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.

报告人简介:

周帆,上海财经大学统计与管理学院副教授,博士毕业于美国北卡罗来纳大学教堂山分校生物统计系,现担任统计学顶刊JASA:ACS 的副主编。研究兴趣包括将经典统计学思想与深度学习,强化学习等机器学习框架相结合,解决人工智能领域的前沿理论和方法问题,在包括JASA,JMLR, Nature Genetics, NeurIPS, ICML, ICLR等统计学,机器学习顶刊和顶会上发表了数十篇文章。曾获泛华统计协会新研究者奖,北卡生统系Barry H. Margolin奖等奖项,入选上海市东方英才青年项目。