Hao Li

Professor

Relevant Thesis-Based Degree Programs

 
 

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Algorithmic learning in games (2024)

This dissertation studies algorithmic agents that interact repeatedly in strategic settings.Chapter 2 provides asymptotic results for a family of reinforcement learning algorithmsknown as ‘actor-critic learners’. Each such algorithmic agent simultaneously estimates whatis called a ‘critic’, such as a value function, and updates its policy, which is referred to asthe ‘actor’. The critic is used to indicate directions of improvement for the actor. I establishsufficient conditions for the consistency of each agent’s parametric critic estimator, whichenables them to adapt and find optimal responses despite the non-stationarity inherent tomulti-agent settings. The conditions depend on the environment, number of observationsused in the critic estimation, and policy stepsize.Chapter 3 presents an analytical characterization of the long run policies learned byalgorithmic agents in the multi-agent setting. The algorithms studied here form a supersetof the family considered in chapter 2. These algorithms update policies, which are mapsfrom observed states to actions. I show that the long run policies correspond to equilibriathat are stable points of a tractable differential equation.In chapter 4, I consider algorithmic agents playing a repeated Cournot game of quantitycompetition. In this situation, learning the stage game Nash equilibrium serves as noncollusivebenchmark. I give necessary and sufficient conditions for this Nash equilibriumnot to be learned. These conditions are requirements on the state variables of the algorithms,and on the stage game. When algorithms determine actions based only on the past period’sprice, the Nash equilibrium can be learned. However, agents may condition their actions onricher types of state variables beyond the past period’s price. In that case, I give sufficientconditions such that the policies converge to a collusive equilibrium with positive probability,while never converging to the Nash equilibrium.

View record

 

Membership Status

Member of G+PS
View explanation of statuses

Program Affiliations

 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.

 
 

Sign up for an information session to connect with students, advisors and faculty from across UBC and gain application advice and insight.