Hao Li
Relevant Thesis-Based Degree Programs
Graduate Student Supervision
Doctoral Student Supervision
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
This dissertation studies algorithmic agents that interact repeatedly in strategic settings.Chapter 2 provides asymptotic results for a family of reinforcement learning algorithmsknown as ‘actor-critic learners’. Each such algorithmic agent simultaneously estimates whatis called a ‘critic’, such as a value function, and updates its policy, which is referred to asthe ‘actor’. The critic is used to indicate directions of improvement for the actor. I establishsufficient conditions for the consistency of each agent’s parametric critic estimator, whichenables them to adapt and find optimal responses despite the non-stationarity inherent tomulti-agent settings. The conditions depend on the environment, number of observationsused in the critic estimation, and policy stepsize.Chapter 3 presents an analytical characterization of the long run policies learned byalgorithmic agents in the multi-agent setting. The algorithms studied here form a supersetof the family considered in chapter 2. These algorithms update policies, which are mapsfrom observed states to actions. I show that the long run policies correspond to equilibriathat are stable points of a tractable differential equation.In chapter 4, I consider algorithmic agents playing a repeated Cournot game of quantitycompetition. In this situation, learning the stage game Nash equilibrium serves as noncollusivebenchmark. I give necessary and sufficient conditions for this Nash equilibriumnot to be learned. These conditions are requirements on the state variables of the algorithms,and on the stage game. When algorithms determine actions based only on the past period’sprice, the Nash equilibrium can be learned. However, agents may condition their actions onricher types of state variables beyond the past period’s price. In that case, I give sufficientconditions such that the policies converge to a collusive equilibrium with positive probability,while never converging to the Nash equilibrium.
View record
If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.