Karthik Pattabiraman

Associate Professor

Relevant Degree Programs


Graduate Student Supervision

Doctoral Student Supervision (Jan 2008 - Nov 2019)
Addressing security in drone systems through authorization and fake object detection (2020)

There now exists more than eight billion IoT devices with expected growth to reach over 22 billion by 2025. IoT devices are comprised of sensor and actuator components which generate live-stream data and share information via a common communication link, e.g., the Internet. For example, in a smart home, a number of IoT devices such as a Google Home/Amazon Alexa, smart plugs, security cameras, a garage door, and a thermostat connect to the WiFi network to routinely communicate with each other, share information, and take actions accordingly. However, a main security challenge is protecting shared information between authorized devices/users while distinguishing real objects from fake ones in the network. Such a challenge aggravates man-in-the-middle, and denial-of-service vulnerabilities. To defend such concerns, in this thesis, we first propose an authorization framework called Dynamic Policy-based Access Control (DynPolAC) as a model for protecting information in dynamic and resource-constrained IoT systems. We focus our experiments with DynPolAC on an IoT environment comprised of drones. DynPolAC achieves more than 7x speed performance improvements in authorization when compared to previously proposed methods for resource-constrained IoT platforms such as drones. Secondly, in this thesis, we implement a method called Phoenix to detect fake drones in an IoT network from real drones. We experimentally train and derive Phoenix from a control function called the Lyapunov stability function. We evaluate Phoenix for drones using an autopilot simulator as well as flying a real drone. We find that Phoenix takes about 50 ms to distinguish real drones from fake ones, while by asymmetry, it could take days for motivated attackersto reconstruct Phoenix. Phoenix also achieves a precision rate of 99.55% to detect real drones and a recall rate of 99.84% to detect fake drones.

View record

Approaches for building error resilient applications (2020)

Transient hardware faults have become one of the major concerns affecting the reliability of modern high-performance computing (HPC) systems. They can cause failure outcomes for applications, such as crashes and silent data corruptions (SDCs) (i.e. the application produces an incorrect output). To mitigate the impact of these failures, HPC applications need to adopt fault tolerance techniques.The most common practices of fault tolerance techniques include (i) characterization techniques, such as fault injection and architectural vulnerability factor (AVF)/program vulnerability factor (PVF) analysis; (ii) run-time error detection techniques; and (iii) error recovery techniques. However, these approaches have the following shortcomings: (i) fault injections are generally time-consuming andlack predictive power, while the AVF/PVF analysis offers low accuracy; (ii) prior techniques often do not fully exploit the program’s error resilience characteristics; and (iii) the application constantly pays a performance/storage overhead.This dissertation proposes comprehensive approaches to improve the above techniques in terms of effectiveness and efficiency. In particular, this dissertation makes the following contributions: First, it proposes ePVF, a methodology that distinguishes crash-causing bits from the architecturally correct execution (ACE) bits and obtains a closer estimate of the SDC rate than PVF analysis (by 45% to 67%). To reduce the overall analysis time, it samples representative patterns from ACE bits and obtains a good approximation (less than 1% error) for the overall prediction. This dissertation applies the ePVF methodology to error detection, which leads to a 30% lower SDC rate than well-accepted hot-path instruction duplication.Second, this dissertation combines the roll-forward recovery and the roll-back recovery schemes and demonstrates the improvement in the overall efficiency of the C/R with two systems: LetGo (for faults affecting computational components) and BonVoision (for faults affecting DRAM memory). Overall, LetGo is able to elide 62% of the crashes caused by computational faults and convert them to continued execution (out of these 80% result in correct output while a majority of the rest fall back on the traditional roll-back recovery technique). BonVoision is able to continue to completion 30% of the DRAM memory detectable but uncorrectable errors (DUEs).

View record

Understanding and modeling error propagation in programs (2019)

Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes and increasing manufacturing variations. The impact of hardware faults on programs can be catastrophic, and can lead to substantial financial and societal consequences. Error propagation is often the leading cause of catastrophic system failures, and hence must be mitigated. Traditional hardware only techniques to avoid error propagation are energy hungry, and hence not suitablefor modern computer systems (i.e., commodity systems). Researchers have proposed selective software-based protection techniques to prevent error propagation at lower costs. However, these techniques use expensive fault injection simulations to determine which parts of a program must be protected. Fault injection simulation artificially introduces a fault to program execution and observefailures (if any) upon the completion of the program execution. Thousands of such simulations need to be performed in order to achieve statistical significance. It is time-consuming as even a single program execution of a common application may take a long time. In this dissertation, I first characterize error propagation in programs that lead to different types of failures, proposed both empirical and analytical approaches to identify and mitigate error propagation without expensive fault injections. The key observation is that only a small fraction of states are responsible for almost all error propagation in programs, and the propagation falls into identifiable patterns which can be modeled efficiently. The proposed techniques are nearly as close as fault injection approaches in measuring failure rates of programs, and orders of magnitude faster than fault injections. This allows developers to build low-cost fault-tolerant applications in an extremely efficient manner.

View record

Understanding motifs of program behaviour and change (2018)

Program comprehension is crucial in software engineering; a necessary step for performing many tasks. However, the implicit and intricate relations between program entities hinder comprehension of program behaviour and change. It is particularly a difficult endeavour to understand dynamic and modern programming languages such as JavaScript, which has grown to be among the most popular languages. Comprehending such applications is challenging due to the temporal and implicit relations of asynchronous, DOM-related and event-driven entities spread over the client and server sides.The goal of the work presented in this dissertation is to facilitate program comprehension through the following techniques. First, we propose a generic technique for capturing low-level event-based interactions in a web application and mapping those to a higher-level behavioural model. This model is then transformed into an interactive visualization, representing episodes of execution through different semantic levels of granularity. Then, we present a DOM-sensitive hybrid change impact analysis technique for JavaScript through a combination of static and dynamic analysis. Our approach incorporates a novel ranking algorithm for indicating the importance of each entity in the impact set. Next, we introduce a method for capturing a behavioural model of full-stack JavaScript applications’ execution. The model is temporal and context-sensitive to accommodate asynchronous events, as well as the scheduling and execution of lifelines of callbacks. We present a visualization of the model to facilitate program comprehension for developers. Finally, we propose an approach for facilitating comprehension by creating an abstract model of software behaviour. The model encompasses hierarchies of recurring and application-specific motifs. The motifs are abstract patterns extracted from traces through our novel technique, inspired by bioinformatics algorithms. The motifs provide an overview of the behaviour at a high level, while encapsulating semantically related sequences in execution. We design a visualization that allows developers to observe and interact with inferred motifs.We implement our techniques in open-source tools and evaluate them through a set of controlled experiments. The results show that our techniques significantly improve developers’ performance in comprehending the behaviour and impact of change in software systems.

View record

Security analysis and intrusion detection for embedded systems (2017)

No abstract available.

On the detection, localization and repair of client-side JavaScript faults (2016)

With web application usage becoming ubiquitous, there is greater demand for making such applications more reliable. This is especially true as more users rely on web applications to conduct day-to-day tasks, and more companies rely on these applications to drive their business. Since the advent of Web 2.0, developers often implement much of the web application’s functionality at the client-side, using client-side JavaScript. Unfortunately, despite repeated complaints from developers about confusing aspects of the JavaScript language, little work has been done analyzing the language’s reliability characteristics. With this problem in mind, we conducted an empirical study of real-world JavaScript bugs, with the goal of understanding their root cause and impact. We found that most of these bugs are DOM-related, which means they occur as a result of the JavaScript code’s interaction with the Document Object Model (DOM). Having gained a thorough understanding of JavaScript bugs, we designed techniques for automatically detecting, localizing and repairing these bugs. Our localization and repair techniques are implemented as the AutoFLox and Vejovis tools, respectively, and they target bugs that are DOM-related. In addition, our detection techniques – Aurebesh and Holocron – attempt to find inconsistencies that occur in web applications written using JavaScript Model-View-Controller (MVC) frameworks. Based on our experimental evaluations, we found that these tools are highly accurate, and are capable of finding and fixing bugs in real-world web applications.

View record

Tolerating intermittent hardware errors : characterization, diagnosis and recovery (2013)

Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in processor performance. At the same time, the scaling has led to an increase in the frequency of hardware errors due to high process variations, extreme operating conditions and manufacturing defects. Recent studies have found that 40% of the processor failures in real-world machines are due to intermittent hardware errors. Intermittent hardware errors are non-deterministic bursts of errors that occur in the same physical location. Intermittent errors have characteristics that are different from transient and permanent errors, which makes it challenging to devise efficient fault tolerance techniques for them.In this dissertation, we characterize the impact of intermittent hardware faults on programs using fault injection experiments at the micro-architecture level. We find that intermittent errors are likely to generate software visible effects when they occur. Based on our characterization results, we build intermittent error tolerance techniques with focus on error diagnosis and recovery. We first evaluate the impact of different intermittent error recovery scenarios on a processor's performance and availability. We then propose DIEBA (Diagnose Intermittent hardware Errors in microprocessors by Backtracing Application), a software-based technique to diagnose the fault-prone functional units in a processor.

View record

Master's Student Supervision (2010 - 2018)
Multi-dimensional invariant detection for cyber-physical system security : a case study of smart meters and smart medical devices. (2018)

Cyber-Physical Systems (CPSes) are being widely deployed in security- critical scenarios such as smart homes and medical devices. Unfortunately, the connectedness of these systems and their relative lack of security measures makes them ripe targets for attacks. Specification-based Intrusion Detection Systems (IDS) have been shown to be effective for securing CPSs. Unfortunately, deriving invariants for capturing the specifications of CPS systems is a tedious and error-prone process. Therefore, it is important to dynamically monitor the CPS system to learn its common behaviors and formulate invariants for detecting security attacks. Existing techniques for invariant mining only incorporate data and events, but not time. However, time is central to most CPSes, and hence incorporating time in addition to data and events, is essential for achieving low false positives and false negatives.This thesis proposes ARTINALI : A Real-Time-specific Invariant iNfer- ence ALgorIthm, which mines dynamic system properties by incorporating time as a first-class property of the system. We build ARTINALI-based Intrusion Detection Systems (IDSes) for two CPSes, namely smart meters and smart medical devices, and measure their efficacy. We find that the ARTINALI-based IDS significantly reduces the ratio of false positives and false negatives by 16 to 48% (average 30.75%) and 89 to 95% (average 93.4%) respectively over other dynamic invariant detection tools. Furthermore, it incurs about 32% performance overhead, which is comparable to other invariant detection techniques.

View record

Configurable detection of SDC-causing errors in programs (2015)

Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle, and do not allow programmers to trade off performance for SDC coverage. Further, many of them require tens of thousands of fault injection experiments, which are highly time-intensive. In this paper, we propose two empirical models, namely SDCTune and SDCAuto, to predict the SDC proneness of a program’s data. Both models are based on static and dynamic features of the program alone, and do not require fault injections to be performed. We then develop an algorithm using both models to selectively protect the most SDC-prone data in the program subject to a given performance overhead bound. Our results show that both models are accurate at predicting the SDC rate of an application. And in terms of efficiency of detection (i.e., ratio of SDC coverage provided to performance overhead), our technique outperforms full duplication by a factor of 0.78x to 1.65x with SDCTune model, and 0.62x to 0.96x with SDCAuto model.

View record

Finding resilience-friendly compiler optimizations using meta-heuristic search techniques (2015)

With the projected increase in hardware error rates in the future, software needs to be resilient to hardware faults. An important factor affecting a program's error resilience is the set of optimizations used when compiling it. Compiler optimizations typically optimize for performance or space, and rarely for error resilience. However, prior work has found that applying optimizations injudiciously can lower the program's error resilience as they often eliminate redundancy in the program. In this work, we propose automated techniques to find the set of compiler optimizations that can boost performance without degrading its overall resilience. Due to the large size of the search space, we use search heuristic algorithms to efficiently explore the space and find an optimal sequence of optimizations for a given program. We find that the resulting optimization sequences have significantly higher error resilience than the standard optimization levels (i.e., O1, O2, O3), while attaining comparable performance improvements with the optimizations levels. We also find that the resulting sequences reduce the overall vulnerability of the applications compared to the standard optimization levels.

View record

Mining Stack Overflow for questions asked by web developers (2015)

No abstract available.

Evaluating the Error Resilience of GPGPU Applications (2014)

No abstract available.

Failure analysis and prediction in compute clouds (2014)

Most cloud computing clusters are built from unreliable, commercial off-the-shelf components compared with supercomputer clusters. The high failure rates in their hardware and software components result in frequent node and application failures. Therefore, it is important to understand their failures to design a reliable cloud system. This thesis presents a characterization study of cloud application failures, and proposes a method to predict application failures in order to save resources.We first analyze a workload trace from a production cloud cluster and characterize the observed failures. The goal of our work is to improve the understanding of failures in compute clouds. We present the statistical properties of job and task failures, and attempt to correlate them with key scheduling constraints, node operations, and attributes of users in the cloud. We observe that there are many opportunities to enhance the reliability of the applications running in the cloud, and further nd that resource usage patterns of the jobs can be leveraged by failure prediction techniques.Next, we propose a prediction method based on recurrent neural networks to identify the failures. It takes the resource usage measurements or performance data, and generate features to categorize the applications into different classes. We then evaluate the method on the cloud workload trace. Our results show that the model is able to predict application failures. Moreover, we explore early classification to identify failures, and find that the prediction algorithm provides the cloud system enough time to take proactive actions much earlier than the termination of applications to avoid resource wastage.

View record

Integrated hardware-software diagnosis of intermittent faults (2014)

Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only diagnosis techniques incur significant power and area overheads. On the other hand, software-onlydiagnosis techniques have low power and area overheads, but have limited visibility into many micro-architecturalstructures and hence cannot diagnose faults in them. To overcome these limitations, we propose a hardware-softwareintegrated framework for diagnosing intermittent faults. The hardware part of our framework, called SCRIBEcontinuously records the resource usage information of every instruction in the processor, and exposes it tothe software layer. SCRIBE has 0.95% on-chip area overhead, incurs a performance overhead of 12% and power overhead of 9%, on average.The software part of our framework is called SIED and uses backtracking from the program's crash dump to find the faulty micro-architectural resource. Our technique has an average accuracy of 84% in diagnosing the faulty resource, which in turn enables fine-grained deconfiguration with less than 2% performance loss after deconfiguration.

View record

Error detection for soft computing applications (2013)

Hardware errors are on the rise with reducing chip sizes, and power constraints have necessitated the involvement of software in hardware error detection. At the same time, emerging workloads in the form of soft computing applications, (e.g., multimedia applications) can tolerate most hardware errors as long as the erroneous outputs do not deviate significantly from error-free outcomes. We term outcomes that deviate significantly from the error-free outcomes as Egregious Data Corruptions (EDCs). In this thesis, we propose a technique to place detectors for selectively detecting EDC causing errors in an application. Our technique identifies program locations for placing high coverage detectors for EDCs using static analysis and runtime profiling. We evaluate our technique on six benchmarks to measure the EDC coverage under given performance overhead bounds. Our technique achieves an average EDC coverage of 82%, under performance overheads of 10%, while detecting only 10% of the Non-EDC and benign faults. We also explore the performance-resilience tradeoff space, by studying the effect of compiler optimizations on the error resilience of soft computing applications, both with and without our technique.

View record

Characterizing the JavaScript errors that occur in production web applications : an empirical study (2012)

Client-side JavaScript is being widely used in popular web applications to improve functionality, increase responsiveness, and decrease load times. However, it is challenging to build reliable applications using JavaScript. This work presents an empirical characterization of the error messages printed by JavaScript code in web applications, and attempts to understand their root causes.We find that JavaScript errors occur in production web applications, and that the errors fall into a small number of categories. In addition, we find that certain types of web applications are more prone to JavaScript errors than others. We further find that both non-deterministic and deterministic errors occur in the applications, and that the speed of testing plays an important role in exposing errors. Finally, we study the correlations among the static and dynamic properties of the application and the frequency of errors in it in order to understand the root causes of the errors.

View record

Hardware error detection in multicore parallel programs (2012)

The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. Simultaneously, the multi-core revolution has impelled software to become parallel. Therefore, there is a compelling need to protect parallel programs from hardware errors.Parallel programs’ tasks have significant similarity in control data due to the use of high-level programming models. In this thesis, we propose BlockWatch to leverage the similarity in parallel program’s control data for detecting hardware errors. BlockWatch statically extracts the similarity among different threads of a parallel program and checks the similarity at runtime. We evaluate BlockWatch on eight SPLASH-2 benchmarks to measure its performance overhead and error detection coverage. We find that BlockWatch incurs an average overhead of 15% across all programs, and provides an average SDC coverage of 97% for faults in the control data.

View record

News Releases

This list shows a selection of news releases by UBC Media Relations over the last 5 years.

Current Students & Alumni


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Learn about our faculties, research, and more than 300 programs in our 2021 Graduate Viewbook!