Relevant Thesis-Based Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Graduate Student Supervision
Doctoral Student Supervision
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
In an era of hardware diversity, the management of applications' allocated memory is a complex task that can have significant performance repercussions. Non-uniformity in the memory hierarchy, along with heterogeneity and asymmetry of chip designs, make the costs of memory accesses unpredictable if the allocated memory is not managed carefully. Poor memory allocation and placement on symmetric, non-uniform memory access (NUMA) server systems can cause interconnect link congestion and memory controller contention, which can drastically impact performance. Furthermore, asymmetric systems consisting of CPU and GPU cores suffer similarly depending on how memory is allocated and what types of cores are accessing it. Furthermore, modern chip designs are integrating compute units into the memory modules to bring compute closer to memory and eliminate the high costs of transferring data. Hence, accessing memory efficiently is becoming increasingly challenging as systems evolve toward heterogeneity.Our contribution is a detailed analysis and insight into software and hardware memory management techniques on modern symmetric server-class and asymmetric heterogeneous processors consisting of integrated CPU and GPU cores, and to recently commercially available Processing In Memory (PIM) systems. In particular, we examine three types of systems that are affected by memory access bottlenecks. First, we look at NUMA systems, then integrated CPU-GPU systems, and finally PIM systems. We analyze these systems and suggest possible solutions to mitigate memory access issues.For NUMA systems, we propose a holistic memory management algorithm that intelligently distributes memory pages to reduce congestion and improve performance. Then, we provide a detailed analysis of memory management methods on integrated CPU-GPU systems focusing on performance and functionality trade-offs. Our goal is to expose the performance impact of memory management techniques and assess the viability and advantage of running applications with complex behaviors on such integrated CPU-GPU systems. Finally, we examine PIM systems with a case study of image decoding, which is a critical stage for many deep learning applications. We show that the extreme compute scalability of PIM systems can be utilized to accelerate image decoding with performance potential that can surpass CPUs and GPUs.
People store increasing amounts of personal data digitally, from emails to credit cards. Two prevalent places this data is stored are on cloud platforms hosted by third parties and on mobile devices, which are easily lost or stolen and which run any of millions of untrusted third-party applications.We explore security through isolation as a means to protect the sensitive data residing on cloud and mobile platforms. We carefully consider the attributes of each platform and the specifics of the attacks we are trying to protect against to select isolation mechanisms that provide the necessary security benefit without incurring an undue performance penalty.Today's cloud platforms provide isolation through virtualization boundaries, which are typically managed by a monolithic control VM. We decompose such monolithic entities to reduce the attack surface. We break apart the control VM of Xen, a mature virtualization platform, into least-privilege components. We leverage this disaggregation to restart these components frequently, reducing the time window for attacks.Today's mobile platforms provide isolation through passwords and process boundaries. However, these protection mechanisms do little once an attacker can access the physical memory directly. We encrypt sensitive data while it is in memory to prevent direct, physical access to it. We leverage cache locking to provide a safe location embedded within the system chip itself to decrypt application data as it is required.Sharing data between applications is crucial for mobile platforms and is achieved using inter-process communication (IPC). An attacker that gains control of the OS also gains access to all this shared data. We encrypt IPC using a security monitor that operates outside the OS. Leveraging previous work on strong application boundaries, we provide end-to-end encrypted IPC, preventing a compromised OS from being able to access this sensitive data.We demonstrate three systems. First, we disaggregate Xen's monolithic control VM, improving security and reducing performance by 2% or less for most benchmarks. Second, we protect sensitive data on mobile devices from physical memory attacks while preserving performance within 5% for normal Android application usage. Third, we protect all IPC on Android devices incurring no noticeable performance overhead.
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
Reactive programs are programs that process external events, such as signals andmessages. Device drivers and cloud microservices are examples of reactiveprograms. Systems built from reactive programs are concurrent and exhibit a highdegree of nondeterminism, making non-exhaustive testing inadequate. Ahigher-level language can be used to write a specification: a formal definitionof a program’s desired behaviour. Such specifications may be easier to reasonabout, but proofs of the specification say nothing of a low-levelimplementation.Program synthesis is one way to bridge this gap. A synthesizer searches a spaceof candidate programs for an implementation that satisfies the specification. Ingeneral, the number of candidate programs is infinite, making synthesisundecidable. Even a bounded search, e.g., on program length, soon becomesintractable as that search space grows exponentially.This work introduces COMET, a system for synthesizing reactive programs fromunbounded nondeterministic iterative transformations (UNITY) specifications.Recent work in symbolic execution and solver-aided synthesis has advanced thestate of the art, but symbolic techniques also lead to exponential blowup.COMET seeks to avoid exponential blowup by constraining the search space andsynthesizing in small steps. COMET synthesizes non-trivial programs forsequential and concurrent languages by applying three techniques: symbolicexecution of the specification into guarded traces, intermediate target tracesynthesis, and recursive synthesis of target expressions. To evaluate thisapproach, I synthesize Arduino and Verilog programs from UNITY specifications ofthe Paxos consensus proposer and acceptor roles.
Recent studies demonstrated that the reproducibility of previously published computational experiments is inadequate. Many of these published computational experiments are not reproducible, because they never recorded or preserved their computational environment. This environment consists of artifacts such as packages installed in the language, libraries installed on the host system, file names, and directory hierarchy. Researchers have created reproducibility tools to help mitigate this problem, but they do nothing for the experiments that already exist in online repositories. This situation is not improving, as researchers continue to publish results every year without using reproducibility tools, likely due to benign neglect as it is common to believe publishing the code and data is sufficient for reproducibility. To clarify the gap between what existing reproducibility tools are capable of and this issue with published experiments, we define a framework to distinguish between actions taken by a researcher to facilitate reproducibility in the presence of a computational environment and actions taken by a researcher to enable reproduction of an experiment when that environment has been lost. The difference between these approaches in reproducibility lies in the availability of a computational environment. Researchers that provide access to the original computational environment perform proactive reproducibility, while those who do not enable only retroactive reproducibility. We present Reproducibility as a Service (RaaS), which is, to our knowledge, the first reproducibility tool explicitly designed to facilitate retroactive reproducibility. We demonstrate how RaaS can fix many of the common errors found in R scripts on Harvard's Dataverse and preserve the recreated computational environment.
The advent of the Internet of Things (IoT) makes it possible for tiny devices with sensing and communication capabilities to be interconnected and interact with the cyber physical world. However, these tiny devices are typically powered by batteries and have limited memory, so they cannot run commodity operating systems that are designed for general-purpose computers, such as Windows and Linux. Embedded operating systems addressed this issue and established a solid foundation for developers to write applications on these tiny devices. IoT devices are deployed everywhere, from smart home appliances to self-driving vehicles, and their applications impose ever-increasing and more heterogeneous demands on software architecture. There are many special-purpose and embedded operating systems built to satisfy these wildly different requirements, from early sensor network operating systems, such as TinyOS and Contiki, to more modern robot and real-time control systems, such as FreeRTOS and Zephyr. However, the rapid evolution and heterogeneity of IoT applications call for a different solution. Specifically, this work introduces Tinkertoy, a set of standard operating system components from which developers can assemble a custom system. Not only does the custom system provide precisely the functionality needed by an application, but it does so in up to four time less memory than other IoT operating systems and still has comparable performance to them.