Yankai Cao

Assistant Professor

Research Classification

Research Interests

Optimization, Control and Operations Research
Solar and Wind Energy
Artificial Intelligence
Large Scale Optimization

Relevant Thesis-Based Degree Programs

Affiliations to Research Centres, Institutes & Clusters


Research Methodology

High performance computing
machine learning
Stochastic Optimization


Master's students
Doctoral students
Postdoctoral Fellows
Any time / year round

Dr. Cao’s research focuses on the design and implementation of large-scale local and global optimization algorithms to solve problems that arise in diverse decision-making paradigms such as machine learning, stochastic optimization, and optimal control. His algorithms combine mathematical techniques and emerging high-performance computing hardware (e.g., multi-core CPUs, GPUs, and computing clusters) to achieve computational scalability. Furthermore, Dr. Cao has applied the algorithms and tools to address engineering and scientific questions that arise in diverse application domains including conflict resolution in energy system design, robust control of crystallization systems, predictive control of wind turbines, power management in large networks, and image classification for contaminant detection.

I support public scholarship, e.g. through the Public Scholars Initiative, and am available to supervise students and Postdocs interested in collaborating with external partners as part of their research.
I support experiential learning experiences, such as internships and work placements, for my graduate students and Postdocs.
I am open to hosting Visiting International Research Students (non-degree, up to 12 months).
I am interested in hiring Co-op students for research placements.

Complete these steps before you reach out to a faculty member!

Check requirements
  • Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
  • Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
  • Identify specific faculty members who are conducting research in your specific area of interest.
  • Establish that your research interests align with the faculty member’s research interests.
    • Read up on the faculty members in the program and the research being conducted in the department.
    • Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
  • Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
    • Do not send non-specific, mass emails to everyone in the department hoping for a match.
    • Address the faculty members by name. Your contact should be genuine rather than generic.
  • Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
  • Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
  • Demonstrate that you are familiar with their research:
    • Convey the specific ways you are a good fit for the program.
    • Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
  • Be enthusiastic, but don’t overdo it.
Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.



These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.

Graduate Student Supervision

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Deep learning-based approximation of model predictive control using mixture networks (2023)

Model Predictive Control (MPC) is an optimization-based control scheme exploited in various industrial processes. It determines optimal control inputs that achieve the desired outcome by predicting future behavior based on models while satisfying system constraint sets. The consideration of complex system dynamics and multiple constraints enables the control of nonlinear processes with complicated behavior. Furthermore, because of its extensive applicability, MPC has been applied to the design of supply chain management, especially to scheduling problems that are formulated as mixed-integer linear programming (MILP) problems. However, the online implementation of MPC is challenging, especially for large-scale systems, due to the prohibitive computation cost. In recent years, the approximation method of MPC control laws using deep neural networks (DNNs) has been studied to address this issue. Nevertheless, it struggles to provide accurate approximation when multiple optimal control inputs exist for each system state. In this case, the MPC control laws follow one-to-many mappings, which DNNs cannot correctly approximate as they can only provide one-to-one mappings. Therefore, we propose mixture network-based approximation methods. Mixture networks, with components of probability (density) distributions in the output layer, can approximate the MPC control laws through a combination of conditional probabilities provided by mixing several estimated probability distributions. This approach then generates multiple control inputs with the highest probabilities. Notably, the proposed method can be applied to various problems by selecting an appropriate probability distribution, such as using a Gaussian distribution for nonlinear problems and a Bernoulli distribution for MILP problems. In this thesis, we investigate two case studies: a benchmark problem for nonlinear problems and a scheduling problem in the steel-making process for MILP problems. The simulation results demonstrate that the mixture network-based approximation method outperforms the DNN-based approximation method.

View record

Global optimization of clustering problems (2022)

Clustering is a fundamental unsupervised machine learning task that aims to aggregate similar data into one cluster and separate those in diverse into different clusters. Cluster analysis can always be formulated as an optimization problem. Various objective functions may lead to different clustering problems. In this thesis, we concentrate on k-means and k-center problems. Each can be formulated as a mixed-integer nonlinear programming problem. The work about k-means clustering optimization has been published on ICML 2021 [30]. Moreover, we also submitted the work about global optimization of k-center clustering to ICML 2022 and the paper has been accepted in Phase 1 of reviewing. This thesis provides a practical global optimization algorithm for these two tasks based on a reduced-space spatial branch and bound (BB) scheme. This algorithm can guarantee convergence to the global optimum by only branching on the centers of clusters, which is independent of the dataset’s cardinality. We also design several methods to construct lower and upper bounds at each node in the BB scheme. In addition, for k-center problem, a set of feasibility-based bounds tightening techniques are proposed to narrow down the domain of centers and significantly accelerate the convergence. To demonstrate the capacity of this algorithm, we present computational results on UCI datasets and compare our proposed algorithms with the off-the-shelf global optimal solvers and classical local optimal algorithms. For k-means clustering, the numerical experiments demonstrated our algorithm’s ability to handle datasets with up to 200,000 samples. Besides, for k-center clustering, the serial implementation of the algorithm on the dataset with 14 million samples and 3 features can attain the global optimum to an optimality gap of 0.1% within 2 hours.

View record


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Discover the amazing research that is being conducted at UBC!