Sathish Gopalakrishnan

Associate Professor

Relevant Thesis-Based Degree Programs


Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

An observation-based runtime configuration framework for the front-end network edge (2023)

Despite the prominence of automated runtime configuration procedures, relatively little is known about managing the runtime configurations of general-purpose programming in resource-constrained IoT platforms at the network edge. For example, high-level language-written application programming (e.g., video/audio surveillance) in IoT enables local data processing to decrease latency, bandwidth, and infrastructure costs and address data safety and privacy concerns. However, without a good configuration, such computing generates undesirable performance or sudden and unexpected resource outages, leading to an application or a complete system failure. On the other hand, stringent resources in IoT make the performance of general-purpose programming highly discontinuous, which the existing linear or non-linear models can not capture. As a result, while the current configuration techniques make typical computing (e.g., cloud, High-Performance Computing (HPC)) efficient, it still needs to be determined whether or not they are efficient enough to manage general-purpose edge computing.This research systematically analyzed the runtime configuration challenges for general-purpose programming in IoT. In the process, we discovered several new application performance associations and system resource variance patterns in this state space with which we address the constraints, heterogeneity, discontinuity, and scalability issues of IoT at the network edge. We applied these performance associations and other systematic state space sampling methods to address these issues as they arise in two important and prominent areas of automated runtime configuration: (1) resource-exhaustion detection and (2) performance optimization. The latter area is divided more into a pipeline configuration and b collocated performance approximation.With cross-platform failure prediction, configuration management, and approximation techniques, we apply an intelligent and general set of configuration capabilities to general-purpose edge computing. Across various real-world case studies, our techniques outperform conventional runtime configuration techniques regarding performance improvements and approximation accuracy and pave the way for a new direction toward general-purpose edge computing.

View record

Reinforcement learning for data scheduling in internet of things (IoT) networks (2020)

I investigate data prioritization and scheduling problems on the Internet of Things (IOT) networks that encompass large volumes of data. The required criteria for prioritizing data depend on multiple aspects such as preservation of importance and timeliness of data messages in environments with different levels of complexity. I explore three representative problems within the landscape of data prioritization and scheduling. First, I study the problem of scheduling for polling data from sensors where it is not possible to gather all data at a processing centre. I present a centralized mechanism for choosing sensors to gather data at each polling epoch. Our mechanism prioritizes sensors using information about the data generation rate, the expected value of the data, and its time sensitivity. Our work relates to the restless bandit model in a continuous state space, unlike many other such models. The contribution is to derive an index policy and show that it can be useful even when not optimal through a quantitative study where event arrivals follow a hyper-exponential distribution.Second, I study the problem of balancing timeliness and criticality when gathering data from multiple sources using a hierarchical approach. A central decision-maker decides which local hubs to allocate bandwidth to, and the local hubs have to prioritize the sensors’ messages. An optimal policy requires global knowledge of messages at each local hub, hence impractical. I propose a reinforcement-learning approach that accounts for both requirements. The proposed approach’s evaluation results show that the proposed policy outperforms all the other policies in the experiments except for the impractical optimal policy. Finally, I consider the problem of handling timeliness and criticality trade-off when gathering data from multiple resources in complex environments. There exist dependencies among sensors in such environments that lead to patterns in data that are hard to capture. Motivated by the success of the Asynchronous Advantage Actor-Critic (A3C) approach, I modify the A3C by embedding Long Short Term Memory (LSTM) to improve performance when vanilla A3C could not capture patterns in data. I show the effectiveness of the proposed solution based on the results in multiple scenarios.

View record

High assurance real-time systems: scheduling, budgeting, and safety (2019)

In this dissertation we investigate resource allocation and task scheduling problems pertaining to real-time systems that mandate high levels of assurance. The required assurance guarantees are those of safety and quality of service, atop the basic guarantee required by any (classical) real-time system, namely, timely delivery of responses. We explore three major ideas within the landscape of high assurance real-time systems. First, we study the problem of scheduling Mixed-Criticality real-time tasks, with probabilistic guarantees, on a single processor. The major contribution is that we incorporate software failure rates into the scheduling problem, and we synthesize feasible scheduling policies under which the input task set will satisfy certain failure rate requirements. We model the task scheduling problem as a path-Constrained Markov Decision Process (CMDP). Second, realizing the exorbitant cost of developing safety-critical software, we develop cost-effective methods for achieving both safety integrity and timing predictability through the notion of run-time monitors, coupled with isochronous execution. This model gives rise to interesting multiversion task scheduling problems on multiple processors. We show the intractability of non-preemptive variants of the scheduling problems, and we derive optimal solutions to preemptive variants in polynomial-time. Finally, we consider a general resource budgeting problem where recurrent tasks should maintain certain quality of service levels over an indefinite operational horizon, and where both the task execution demands and the available budgets are noisy and fluctuate randomly. We derive large deviation bounds that are sufficient conditions that would allow the system designer to properly dimension system resources and allocate tasks sufficient budgets to sustain the desired quality of service levels. We then apply the bounds to a case study involving safety-critical systems that employ run-time monitors and isochronous execution, and we show how to synthesize minimum monitor worst-case execution times sufficient to achieve the desired quality of service and meet hard deadlines.

View record

Delay, Reliability, and Trust in Information Dissemination in Vehicular Networks (2016)

The cost of road accidents globally is more than $500 B every year. Intelligent Transportation Systems (ITS) is a promising solution and Vehicular Networks is an ideal candidate for providing a communication platform for ITS applications. Safety-critical applications form the main motivation for intelligent transportation systems. Studying the major concerns in such applications, i.e., delay and reliability, through mathematical analysis is extremely beneficial because it enables us to design optimized schemes. Such analysis is, however, challenging due to the dynamics of a vehicular network. In this research, we have three main contributions. First, we present a mathematical model to study delay and reliability of emergency message dissemination in vehicular networks. This model includes three modules to address effects of channel, contention, and partitioning on delivery delay. This is the first delay model to the best of our knowledge that does all of these, and it gives network architects insight into the limitations of the network and helps them tune parameters such as transmission power and slot time duration. Simulation studies indicate that our model does capture the delay characteristics of vehicular networks for both highway and urban scenarios.An interesting observation from the analytical model confirms the fact that using the vehicle density on the road is a good metric for setting the right forwarding probability in vehicles. We exploit this conclusion and as our second contribution, we propose a completely distributed forwarding strategy, called Middle Is Next or MIN. Extensive simulations affirm the effectiveness of MIN in terms of delay and single-hop reliability in comparison with other well-known routing methods. Trusted communication in vehicular networks is of crucial importance without which all efforts for minimizing the delay or maximizing the reliability could be voided. As our third contribution, we propose FACT: Framework for Application-oriented Context-aware Trust-based communication in vehicular networks. FACT assigns a trust value to each road segment and one to each neighbourhood, instead of each car. Thus, it scales up easily and is completely distributed. Experimental results demonstrate that FACT outperforms other well-known routing protocols since it routes the messages via trusted paths.

View record

Resource allocation and scheduling in wireless mesh networks (2013)

The unreliability of wireless mesh networks creates challenge in designing high performance wireless networks in terms of network throughput, end-to-end delay, and fairness provisioning. In this thesis, the goal is to improve the network performance in terms of these metrics. We explore several techniques such as multipath routing, channel coding, network coding, and interference alignment. We consider resource allocation both in terms of average data rate provisioning and scheduling policies in a time slot basis.First, we propose data rate and channel code rate allocation algorithms for networks with multiple paths to maximize the network throughput while all users can fairly exploit the network resources. We study the effect of adaptive and non-adaptive channel coding schemes. We also consider the end-to-end delay that each network flow experiences for data transmission. For that purpose, we formulate the problem of decreasing the end-to-end delay for network flows while improving the network throughput. Simulation results show that we can decrease the delay at the cost of a slight decrease in network throughput. We also formulate a data rate allocation problem in networks with network coding. Simulation results show that considering link reliabilities in the network coding design dramatically increases the network performance.Data rate allocation algorithms provide the average data rates at which the source must transmit data. They do not determine scheduling on a time slot basis. To address that, we consider transmission scheduling in wireless networks. We also compare the suggested algorithm with a centralized optimal data rate allocation algorithm to verify that our algorithm follows the optimal solution. Through simulations, we show that fairness provisioning leads to higher network performance. We show that the suggested algorithm outperforms the current algorithms in the literature in terms of both network throughput and fairness provisioning.Finally, we consider transmission scheduling in wireless multi-input multi-output (MIMO) systems. We formulate the problem of joint scheduling, interference alignment, and admission control in those networks and use Lyapunov stability theory to solve it. We also develop a heuristic approach to solve the problem in a semi-distributed manner.

View record

Tolerating intermittent hardware errors: Characterization, diagnosis and recovery (2013)

Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in processor performance. At the same time, the scaling has led to an increase in the frequency of hardware errors due to high process variations, extreme operating conditions and manufacturing defects. Recent studies have found that 40% of the processor failures in real-world machines are due to intermittent hardware errors. Intermittent hardware errors are non-deterministic bursts of errors that occur in the same physical location. Intermittent errors have characteristics that are different from transient and permanent errors, which makes it challenging to devise efficient fault tolerance techniques for them.In this dissertation, we characterize the impact of intermittent hardware faults on programs using fault injection experiments at the micro-architecture level. We find that intermittent errors are likely to generate software visible effects when they occur. Based on our characterization results, we build intermittent error tolerance techniques with focus on error diagnosis and recovery. We first evaluate the impact of different intermittent error recovery scenarios on a processor's performance and availability. We then propose DIEBA (Diagnose Intermittent hardware Errors in microprocessors by Backtracing Application), a software-based technique to diagnose the fault-prone functional units in a processor.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

DCRA: Decentralized Cognitive Resource Allocation model for game as a service (2020)

Gaming-as-a-Service (GaaS) has rapidly emerged to the industry of cloud gaming. The power of GaaS lies on having one source code base with multiple users. Several systems were proposed to model GaaS. However, there are no scalable and reliable models for such a service. The importance of having such a model lies on having an Internet-scale platform able to provide flexibility of different types of games genre and lower the barrier of end systems (i.e. mobile clients) while taking into consideration the probability of excessive loads and failures. We present a Distributed Cognitive Resource Allocation (DCRA) model to run mobile games on a large-scale distributed system in which we have improvised a unique distributed hash table (DHT)-based routing to expedite the messaging among servers and to minimize the round trip delay to acceptable levels for the targeted mobile games genre. In contrast to existing centralized models, DCRA scales with the increase of mobile clients to handle high concurrent loads of clients' requests while providing a stable level of gaming experience. The results show that DCRA is able to scale well by providing almost fixed throughput and delay while increasing the clients requests load. Also, the system preserve its key features while simulating failures.

View record

Partially-observed graph abstractions for authorship identification and process interference prediction (2017)

Pairwise interactions between objects can be modeled as a graph, which is a set of nodes and edges that each connect a pair of nodes. We consider the problem of predicting whether edges exist between nodes based on other pairs of nodes that we have observed. From a partially-observed graph we extract several neighbourhood and path features, then evaluate several machine learning algorithms for predicting whether edges will exist between unobserved nodes. K-Nearest Neighbours was found to be the best classifier. We propose the novel use of path on a weighted graph as a feature used for prediction. We apply this abstraction to predicting collaboration between authors in an on-line publication database. The unweighted graph contains an edge if two authors collaborated and the weighted graph encodes the number of collaborations. Prediction error rates were less than 3% under ten-fold cross-validation. We also apply this abstraction to predicting whether processes running on the same hardware will compete for resources. The unweighted graph contains an edge if a process executed in more time than when running with another than by itself.The weighted graph had an edge weight that was the increase in execution time. Prediction error rates were less than 14% under ten-fold cross-validation.

View record

Stronghold: A secure data platform for smart homes (2013)

Security is an important concern in day to day life, far augmented when related to sensitive data, such as private information. As such, it is paramount to protect environments and devices wherein the greater amount of information is private, such as in a home, and its encompassing devices; especially so, as the level of privacy desired in a home is much greater than other private mediums. The issue is further exacerbated in a smart home scenario, where applications are ubiquitous and are constantly communicating with the outside world: this impending technological innovation results in previously private data suddenly becoming accessible through non-physical means, allowing potential breaches in privacy. The accessibility to previously unreachable means brings forth new threats that must be tackled to ensure proper confidentiality is kept. Furthermore, proper accessibility of the physical devices must be engaged, due to their newfound non-physically accessibly nature. We survey the security threats existent in smart home environments, along with possible solutions to mitigate the threats. We use methods ranging from human computer interaction, storage optimization, and behavioral learning to better ensure proper functionality. We integrate the solutions in a cohesive form to be applied to any smart home environment that wishes to best keep high confidentiality, availability and integrity. We test the system to ensure the secure data messaging platform provides sufficient throughput for high definition media storage in real time, as potentially necessary in a smart homeWe supplement our study by creating a system capable of functioning on top of the smart server, which unobtrusively automates the daily task of recipe selection, via meal advocation through past meal selection; we do so unobtrusively in an attempt to prevent deviation from conventional home environments. We present a method which possesses higher granularity than the present meal recommendation technologies, and yet, requires less interaction than the web equivalents. Finally, we interface the algorithm to a real smart home server, evaluate the application on real users, and assess their reaction to the technology.

View record

Response-time Analysis and Overload Management in Real-time Systems (2012)

We provide two approaches to handling overload in shared resources offering realtimeguarantees. We first provide a technique (based on mathematical optimization)for identifying the possible causes for an overload situation by computing theworst-case demand of the system, depending upon the amount of requests serviced.Worst-case analysis of response time has a pseudo-polynomial time complexity,and when there is no knowledge about the workload, the complexity further increases.We provide polynomial-time heuristics to reduce the computation timeof the algorithm. Further, we evaluate it against other techniques using stochasticanalysis to stress on the accuracy and ease of estimation of the result. The schedulingpolicy based on the approach is useful to detect an overload in the resource andto allow us to make responsible decisions on it. Secondly, we present a schedulingpolicy (obtained through stochastic approximation) to handle overload in real-timesystems. Competitive analysis of online algorithms has commonly been applied tounderstand the behavior of real-time systems during overload conditions. Whilecompetitive analysis provides insight into the behavior of certain algorithms, it ishard to make inferences about the performance of those algorithms in practice.Similar on-line scheduling approaches tend to function differently in practice dueto factors. Further, most work on handling overload in real-time systems doesnot consider using information regarding the distribution of arrival rates of jobsand execution times to make scheduling decisions. With some information aboutthe workload, we aim to improve the revenue earned by the service provider, ina scenario when each successful job completion results in revenue accrual. Weprove that the policy we outline does lead to increased revenue when compared toa class of scheduling policies that make static resource allocations to different service classes. We also use empirical evidence to underscore the fact that this policyperforms better than a variety of other scheduling policies. The ideas presentedcan be applied to several soft real-time systems, specifically systems with multipleservice classes.

View record

A study of datta pertitioning and prefetching for hybrid storage systems (2011)

Storage system performance has received much attention since the early days of computing systems because of the mismatch between computing power and I/O access times. The invention of new technologies have increased storage system performance, but due to the cost-performance trade off no one type of storage media is capable to meet both performance and capacity requirements. This motivated us to study the impact of data management techniques such as data partitioning and correlated prefetching on I/O performance when two different non-volatile storage media are integrated into a computing system. First, we consider partitioning data blocks between two devices, where one device is significantly faster than the other. We assume that significantly faster performance also implies a significantly smaller capacity. Clearly not all data can be stored or cached in the faster device. Second, to improve performance of the slower device, we investigate if correlation-directed prefetching (CDP) may offer significant benefits. Although CDP has been studied previously, we look into some special aspects of it. We analyze how different block correlation analysis heuristics affect the performance of CDP.We developed a simulator to study the effect of the different techniques when using devices with differing characteristics. Our results show that data partitioning can significantly improve storage system performance. For a hard disk and solid-state drive based system, we achieved 2--92% improvement for different traces. We also show that data partitioning based on application long-range block access patterns performs significantly better than caching temporal locality of references.To evaluate the hybrid system in real world settings, we present a case study, a prototype data block manager for Linux-based systems that permits data to be partitioned across an SSD and an HDD. This partitioning is transparent to the file system and the block manager can also trigger data prefetches when there is high correlation between data block accesses.

View record

Approximation algorithms for task allocation with QoS and energy considerations (2011)

We consider general problems of allocating tasks to processors where eachtask is associated with a set of service classes. A service class is a tuple:the first element represents the resource utilization and the second elementthe quality of service associated with that resource utilization. Before allocatinga task to a processor, we need to assign it to a service class. Weconsider some elementary problems that arise from this setting. What is theminimum number of processors needed if we also need to attain a minimumaggregate QoS level? Given a fixed set of processors, what is the maximumattainable aggregate QoS? Such questions apply to the partitioned schedulingof real-time tasks on multiprocessors and to other resource allocationproblems. The basic questions are NP-Hard, and we present polynomialtime approximation algorithms for certain special, but practical, cases. Wemake interesting use of maximum flows in a bipartite graph while developingthe polynomial time approximation schemes. We then integrate energy expenditureto the model above and address the problem of partitioning a setof independent, periodic, real-time tasks over a fixed set of heterogeneousprocessors while minimizing the energy consumption of the computing platformsubject to a guaranteed quality of service requirement. This problemis NP-Hard and we present a fully polynomial time approximation schemefor this problem. The main contribution of our work is in tackling the problemin a completely discrete, and possibly arbitrarily structured, setting. Inother words, each processor has a discrete set of speed choices. Each taskhas a computation time that is dependent on the processor that is chosento execute the task and on the speed at which that processor is operated. Further, the energy consumption of the system is dependent on the decisionsregarding task allocation and speed settings.

View record

On energy-efficient data gathering in sensor networks (2011)

Wireless sensor networks (WSNs) are becoming increasingly important as they provide an ideal platform for monitoring and controlling physical environments. Starting from small match-box sized nodes to tiny coin-like sensors a WSN promises to be the most ubiquitous information gathering system produced. Being tiny enables ubiquitous and pervasive sensing. However, this form factor comes at the cost of severe resource constraints for the sensor nodes. The nodes can accommodate only a certain amount of energy in the form of batteries and can store and process only a small amount of data due to its crippled size. Due to this reason, sensor networks cannot host more sophisticated applications and the mean time to failure, due to nodes running out of energy, is also low. These are probably the main reasons why sensor networks have not reached their expected potential. This thesis is an effort to alleviate the energy problem of sensor nodes. We attempt to solve the problem using different data and user centric models which can lead to a multi-fold increase of life for the sensors.Most of the research till date has aimed at micro-adjustments in the sensor hardware and software in order to improve performance. These techniques, though beneficial, increase complexity, and are sometimes difficult to implement. The thesis demonstrates simple techniques that can significantly improve energy-savings over and above the micro-adjustment techniques. The thesis takes a radical point of view and looks at higher level primitives that can be modified for certain applications.This thesis provides two energy reduction techniques. The first technique involves aggressive duty-cycling of sensors while maintaining connectivity and coverage followed by reconstruction at the base station. Significant gains can be achieved when the sensed environment has correlation between sensor readings. The second technique involves sampling interval scheduling depending on the utilization of the sampled data based on user queries. While the first method ensures correct reproduction of the sensed environment, while reducing the burden on individual sensors, the second method provides an optimal sampling frequency that regulates energy consumption depending on user demands.

View record

Current Students & Alumni

This is a small sample of students and/or alumni that have been supervised by this researcher. It is not meant as a comprehensive list.

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Get key application advice, hear about the latest research opportunities and keep up with the latest news from UBC's graduate programs.