Matei Ripeanu


Research Interests

Computer Systems
Data Analytics
distributed systems
Graph Analytics
High performance computing
Social Networks
Storage Systems

Relevant Thesis-Based Degree Programs



Doctoral students
Postdoctoral Fellows
I support public scholarship, e.g. through the Public Scholars Initiative, and am available to supervise students and Postdocs interested in collaborating with external partners as part of their research.
I am open to hosting Visiting International Research Students (non-degree, up to 12 months).

Complete these steps before you reach out to a faculty member!

Check requirements
  • Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
  • Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
  • Identify specific faculty members who are conducting research in your specific area of interest.
  • Establish that your research interests align with the faculty member’s research interests.
    • Read up on the faculty members in the program and the research being conducted in the department.
    • Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
  • Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
    • Do not send non-specific, mass emails to everyone in the department hoping for a match.
    • Address the faculty members by name. Your contact should be genuine rather than generic.
  • Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
  • Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
  • Demonstrate that you are familiar with their research:
    • Convey the specific ways you are a good fit for the program.
    • Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
  • Be enthusiastic, but don’t overdo it.
Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.



These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Asynchronous dynamic graph processing with real-time analysis (2024)

The rapid increase in connected data from various sources such as the World Wide Web, social networks, and financial transactions has led to the widespread use of graph-based representations for data analysis of these networks. However, traditional high-performance computing (HPC) solutions designed for static graphs are inefficient and impractical for dynamic graphs that evolve over time. This approach leads to high overheads, loss of information between snapshots, and potential correctness issues.The demand for fast, real-time analytics on continuously evolving real-world systems at a massive scale has become critical for applications such as online recommendations, financial fraud detection, and counter-terrorism. For example, social media networks like Facebook handle potentially millions of interactions per second, and payment networks like Visa process thousands of transactions per second. To address these challenges, my dissertation focuses on the opportunities and challenges of analyzing dynamic graphs in real-time, offering an infrastructure-algorithm co-design for dynamic graph analytics at scale.To this end, I develop and present a pattern for dynamic graph algorithms, and a supporting software infrastructure architecture, that together form a cohesive real-time graph analysis model. My algorithm pattern is designed to be amenable to distributed systems with concepts such as message passing, asynchronicity, and termination, while considering the timeliness requirements for real-time analysis. The infrastructure architecture considers real-world properties of dynamic data generation and hardware constraints, aiming for versatility, performance, and scalability. It supports dynamic graph topology evolution and provides interfaces for expressing algorithms for dynamic graph analysis and collecting results during runtime.I demonstrate that many common static graph algorithms can be re-designed for dynamic processing and real-time analysis, and can be built and scaled efficiently. My dynamic graph model offers advantages over static designs, such as low-cost updates to the graph and the ability to observe algorithm results before or after topology modifications. The implementation of my model shows near-linear scalability in performance, and supports real-time analysis at potentially orders of magnitude higher evolution rates compared to alternative designs; providing a generic, scalable, and performant solution for dynamic graph analysis, addressing the challenges of analyzing large-scale, continuously evolving network data in real-time.

View record

On paradigm shifts : enabling proactive defenses by identifying the vulnerable population, and online bitemporal dynamic graph analytics (2023)

Fueled by the massive amount of data and meta-data harvested by large-scale online service providers, two trends stand out: the broad adoption of machine learning particularly for cybersecurity defenses, and the growing pains of temporal graph analytics particularly for dynamically evolving systems.In this dissertation, my overarching goal is to explore novel ways: to effectively harness this harvested data to evidently improve the security of online platforms in general and their most vulnerable users in particular, and to explicitly model the temporal evolution of this data to efficiently enable business use cases that can not be served by existing graph analytics systems. To that end, I advocate for a paradigm shift across two high impact domains: cybersecurity and graph analytics.On the one hand, existing cybersecurity defenses are reactive, and victim-agnostic: they are predicated on identifying the attacks/attackers, and do not take user characteristics into account. In contrast, I propose a proactive approach based on identifying the vulnerable population, and leveraging this information to improve the security of the platform in general and the most vulnerable users in particular.To that end, I approach harnessing the vulnerable population under a victim-centric defense paradigm while contrasting against conventional defenses, and demonstrate its feasibility using four months of production data encompassing billions of events from hundreds of millions of users.To my knowledge, I am the first to propose and discuss such a defense paradigm.On the other hand, existing graph analytics systems are mostly static, and non-temporal: they are not fully able to support modeling systems that evolve dynamically over time while supporting the queries (including current state, historical, and audit queries) required by today's use cases. In contrast, I contend that future graph analytics systems should be: online, dynamic, and employ bitemporal modeling at their core. To that end, I examine the use cases that are an ideal match for an online bitemporal dynamic graph analytics system, explore the design trade-off space, and develop and characterize several designs targeting different points within that space.To my knowledge, I am the first to propose, develop, and characterize such a system end-to-end.

View record

Towards variability-aware frequency scaling on heterogeneous edge platforms (2023)

Recent Edge applications (e.g., machine learning inference) are becoming more sophisticated and computationally demanding. To meet their Quality of Service (QoS) objectives, today’s heterogeneous Edge platforms (e.g., NVIDIA Jetson platform) incorporate several architectural innovations. One that stands out is the wide frequency configuration space (more than a dozen frequency levels per processing unit spanning a 10x max-min ratio). We postulate that this can be harnessed to better navigate the trade-off space between performance and power consumption. This dissertation makes progress towards harnessing frequency scaling on Edge platforms. We start by exploring the potential gains from frequency scaling on the NVIDIA Jetson platform. To this end, we develop an empirical methodology to characterize the performance and power consumption behavior of the Jetson platform under different frequency configurations. Our characterization indicates that indeed there is an opportunity to improve performance and/or energy-efficiency with careful frequency configuration selection. However, one challenge is to estimate the impact of the frequency configuration choice on performance and power consumption of the workload.To overcome this challenge, we employ machine learning techniques to build performance and power consumption models for the target workload. While developing these models, we find that the quality of their predictions depends on where the models are deployed (even if they are deployed on identical devices with identical software stacks). This leads us to postulate that variability in performance and power consumption among nominally identical Edge platforms exists and is sizeable. To investigate this hypothesis, we develop statistical tools that allow developers to detect, quantify, categorize, and compare variability. Then we present a set of actions one can take to mitigate the impact of variability. We evaluate all the techniques and approaches on two clusters of popular Edge platforms - the Jetson AGX and Nano.Finally, we focus on developing variability-aware performance and powerconsumption models. We show that not accounting for variability can severely impact the quality of predictions. The evaluation of the models shows that accounting for inter-node variability improves the Root Mean Square Error (RMSE) by 9.5% and 31.9% for runtime and power models respectively.

View record

Approaches for building error resilient applications (2020)

Transient hardware faults have become one of the major concerns affecting the reliability of modern high-performance computing (HPC) systems. They can cause failure outcomes for applications, such as crashes and silent data corruptions (SDCs) (i.e. the application produces an incorrect output). To mitigate the impact of these failures, HPC applications need to adopt fault tolerance techniques.The most common practices of fault tolerance techniques include (i) characterization techniques, such as fault injection and architectural vulnerability factor (AVF)/program vulnerability factor (PVF) analysis; (ii) run-time error detection techniques; and (iii) error recovery techniques. However, these approaches have the following shortcomings: (i) fault injections are generally time-consuming andlack predictive power, while the AVF/PVF analysis offers low accuracy; (ii) prior techniques often do not fully exploit the program’s error resilience characteristics; and (iii) the application constantly pays a performance/storage overhead.This dissertation proposes comprehensive approaches to improve the above techniques in terms of effectiveness and efficiency. In particular, this dissertation makes the following contributions: First, it proposes ePVF, a methodology that distinguishes crash-causing bits from the architecturally correct execution (ACE) bits and obtains a closer estimate of the SDC rate than PVF analysis (by 45% to 67%). To reduce the overall analysis time, it samples representative patterns from ACE bits and obtains a good approximation (less than 1% error) for the overall prediction. This dissertation applies the ePVF methodology to error detection, which leads to a 30% lower SDC rate than well-accepted hot-path instruction duplication.Second, this dissertation combines the roll-forward recovery and the roll-back recovery schemes and demonstrates the improvement in the overall efficiency of the C/R with two systems: LetGo (for faults affecting computational components) and BonVoision (for faults affecting DRAM memory). Overall, LetGo is able to elide 62% of the crashes caused by computational faults and convert them to continued execution (out of these 80% result in correct output while a majority of the rest fall back on the traditional roll-back recovery technique). BonVoision is able to continue to completion 30% of the DRAM memory detectable but uncorrectable errors (DUEs).

View record

Pattern Matching in Massive Metadata Graphs at Scale (2020)

Pattern matching in graphs, that is finding subgraphs that match a smaller template graph within the large background graph is fundamental to graph analysis and serves a rich set of applications. Unfortunately, existing solutions have limited scalability, are difficult to parallelize, support only a limited set of search patterns, and/or focus on only a subset of the real-world problems. This dissertation explores avenues toward designing a scalable solution for subgraph pattern matching. In particular, this work targets practical pattern matching scenarios in large-scale metadata graphs (also known as property graphs) and designs solutions for distributed memory machines that address the two categories of matching problems, namely, exact and approximate matching. This work presents a novel algorithmic pipeline that bases pattern matching on constraint checking. The key intuition is that each vertex or edge participating in a match has to meet a set of constraints specified by the search template. The pipeline iterates over these constraints to eliminate all the vertices and edges that do not participate in any match, and reduces the background graph to the complete set of only the matching vertices and edges. Additional analysis can be performed on this reduced graph, such as full match enumeration. Furthermore, a vertex-centric formulation for this constraint checking algorithm exists, and this makes it possible to harness existing high-performance, vertex-centric graph processing frameworks.The key contributions of this dissertation are solution design following this constraint checking approach for exact and a class of edit-distance based approximate matching, and experimental evaluation to demonstrate effectiveness of the respective solutions. To this end, this work presents design and implementation of distributed vertex-centric, asynchronous algorithms that guarantee a solution with 100% precision and 100% recall for arbitrary search templates.Through comprehensive evaluation, this work provides evidence that the scalability and performance advantages of the proposed approach are significant. The highlights are scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, and at scales (1,024 compute nodes), orders of magnitude larger than used in the past for similar problems.

View record

Accelerating Irregular Applications on Parallel Hybrid Platforms (2015)

Future high-performance computing systems will be hybrid; they will include processors optimized for sequential processing and massively-parallel accelerators. Platforms based on Graphics Processing Units (GPUs) are an example of this hybrid architecture, they integrate commodity CPUs and GPUs. This architecture promises intriguing opportunities: within the same dollar or energy budget, GPUs offer a significant increase in peak processing power and memory bandwidthcompared to traditional CPUs, and are, at the same time, generally-programmable.The adoption of GPU-based platforms, however, faces a number of challenges, including the characterization of time/space/power tradeoffs, the development of new algorithms that efficiently harness the platform and abstracting the accelerators in a generic yet efficient way to simplify the task of developing applications on such hybrid platforms.This dissertation explores solutions to the abovementioned challenges in the context of an important class of applications, namely irregular applications.Compared to regular applications, irregular applications have unpredictable memory access patterns and typically use reference-based data structures, such as trees or graphs; moreover, new applications in this class operate on massive datasets.Using novel workload partitioning techniques and by employing datastructures that better match the hybrid platform characteristics, this work demonstrates that significant performance gains, in terms of both time to solution and energy, can be obtained when partitioning the irregular workload to be processed concurrently on the CPU and the GPU.

View record

Security Analysis of Malicious Socialbots on the Web (2015)

The open nature of the Web, online social networks (OSNs) in particular, makes it possible to design socialbots—automationsoftware that controls fake accounts in a target OSN, and has the ability to perform basic activities similar to those of real users. In the wrong hands, socialbots can be used to infiltrate online communities, build up trust over time, and then engage in various malicious activities.This dissertation presents an in-depth security analysis of malicious socialbots on the Web, OSNs in particular. The analysis focuses on two main goals: (1) to characterize and analyze the vulnerability of OSNs to cyber attacks by malicious socialbots, social infiltration in particular, and (2) to design and evaluate a countermeasure to efficiently and effectively defend against socialbots.To achieve these goals, we first studied social infiltration as an organized campaign operated by a socialbot network (SbN)—a group of programmable socialbots that are coordinated by an attacker in a botnet-like fashion. We implemented a prototypical SbN consisting of 100 socialbots and operated it on Facebook for 8 weeks. Among various findings, we observed that some users are more likely to become victims than others, depending on factors related to their social structure. Moreover, we found that traditional OSN defenses are not effective at identifying automated fake accounts or their social infiltration campaigns.Based on these findings, we designed Íntegro—an infiltration-resilient defense system that helps OSNs detect automated fake accounts via a user ranking scheme. In particular, Íntegro relies on a novel approach that leverages victim classification for robust graph-based fake account detection, with provable security guarantees. We implemented Íntegro on top of widely-used, open-source distributed systems, in which it scaled nearly linearly. We evaluated Íntegro against SybilRank—the state-of-the-art in graph-based fake account detection—using real-world datasets and a large-scale, production-class deployment at Tuenti, the largest OSN in Spain with more than 15 million users. We showed that Íntegro significantly outperforms SybilRank in ranking quality, allowing Tuenti to detect at least 10 times more fake accounts than their current abuse detection system.

View record

Support for Configuration and Provisioning of Intermediate Storage Systems (2015)

This dissertation focuses on supporting the provisioning and configuration of distributed storage systems in clusters of computers that are designed to provide a high performance computing platform for batch applications. These platforms typically offer a centralized persistent backend storage system. To avoid the potential bottleneck of accessing the platform's backend storage system, intermediate storage systems aggregate resources allocated to the application to provide a shared temporary storage space dedicated to the application execution. Configuring an intermediate storage system, however, becomes increasingly complex. As a distributed storage system, intermediate storage can employ a wide range of storage techniques that enable workload-dependent trade-offs over interrelated success metrics such as response time, throughput, storage space, and energy consumption. Because it is co-deployed with the application, it offers the user the opportunity to tailor its provisioning and configuration to extract the maximum performance from the infrastructure. For example, the user can optimize the performance by deciding the total number of nodes of an allocation, splitting these nodes, or not, between the application and the intermediate storage, and choosing the values for several configuration parameters for storage techniques with different trade-offs.This dissertation targets the problem of supporting the configuration and provisioning of intermediate storage systems in the context of workflow-based scientific applications that communicate via files -- also known as many-task computing -- as well as checkpointing applications. Specifically, this study proposes performance prediction mechanisms to estimate performance of overall application or storage operations (e.g., an application turn-around time, application's energy consumption, or response time of write operations). By relying on the target application's characteristics, the proposed mechanisms can accelerate the exploration of the configuration space. The mechanisms use monitoring information available at the application level, not requiring changes to the storage system nor specialized monitoring systems. The effectiveness of these mechanisms is evaluated in a number of scenarios -- including different system scale, hardware platforms, and configuration choices. Overall, the mechanisms provide accuracy high enough to support the user's decisions about configuration and provisioning the storage system, while being 200x to 2000x less resource-intensive than running the actual applications.

View record

Quantifying the Value of Peer-Produced Information in Social Tagging Systems (2014)

Commons-based peer production systems are marked by three main characteristics, they are: radically decentralized, non-proprietary, and collaborative. Peer production is in stark contrast to market-based production and/or on a centralized organization (e.g., carpooling vs. car rental; couch surfing vs. hotels; Wikipedia vs. Encyclopedia Britannica). Social tagging systems represent a class of web systems, where peer production is central in their design. In these systems, decentralized users collect, share, and annotate (or tag) content collaboratively to produce a public pool of annotated content. This uncoordinated effort helps filling the demand for labeling an ever increasing amount of user-generated content on the web with textual information. Moreover, these labels (or simply tags) can be valuable as input to mechanisms such as personalized search or content promotion. Assessing the value of individuals contributions to peer production systems is key to design user incentives to bring high quality contributions. However, quantifying the value of peer-produced information such as tags is intrinsically challenging, as the value of information is inherently contextual and multidimensional. This research aims to address these two issues in the context of social tagging systems. To this end, this study sets forth the following hypothesis: assessing the value of peer-produced information in social tagging systems can be achieved by harnessing context and user behavior characteristics. The following questions guide the investigations. Characterization: (Q1). What are the characteristics of individual user activity? (Q2). What are the characteristics of social user activity? (Q3). What are the aspects that influence users perception of tag value? Design: (Q4). How to assess the value of tags for exploratory search? (Q5). What is the value of peer-produced information for content promotion? This study applies a mixed methods approach. The findings show that patterns of user activity can inform the design of supporting mechanisms for tagging systems. Moreover, the results suggest that the proposed method to assess value of tags is able to differentiate between valuable tags from less valuable tags, as perceived by users. Moreover, the analysis of the value of peer-produced informationfor content promotion shows that peer-produced sources can oftentimes outperform expert-produced sources.

View record

Embracing diversity: Optimizing distributed storage systems for diverse deployment environments (2013)

Distributed storage system middleware acts as a bridge between the upper layer applications, and the lower layer storage resources available in the deployment platform. Storage systems are expected to efficiently support the applications’ workloads while reducing the cost of the storage platform. In this context, two factors increase the complexity of the design of storage systems: First, the applications’ workloads are diverse among number of axes: read/write access patterns, data compressibility, and security requirements to mention only a few. Second, storage system should provide high performance within a certain dollar budget. This dissertation addresses two interrelated issues in this design space. First, can the computational power of the commodity massively multicore devices be exploited to accelerate storage system operations without increasing the platform cost? Second, is it possible to build a storage system that can support a diverse set of applications yet can be optimized for each one of them?This work provides evidence that, for some system designs and workloads, significant performance gains are brought by exploiting massively multicore devices and by optimizing the storage system for a specific application. Further, my work demonstrates that these gains are possible while still supporting the POSIX API and without requiring changes to the application. Finally, while these two issues can be addressed independently, a system that includes solutions to both of them enables significant synergies.

View record

Towards improving the availability and performance of enterprise authorization systems (2009)

Authorization protects application resources by allowing only authorized entities to access them. Existing authorization solutions are widely based on the request-response model, where a policy enforcement point intercepts application requests, obtains authorization decisions from a remote policy decision point, and enforces those decisions. This model enables sharing the decision point as an authorization service across multiple applications. But, with many requests and resources, using a remote shared decision point leads to increased latency and presents the risk of introducing a bottleneck and/or a single point of failure. This dissertation presents three approaches to addressing these problems.The first approach introduces and evaluates the mechanisms for authorization recycling in role-based access control systems. The algorithms that support these mechanisms allow a local secondary decision point to not only reuse previously-cached decisions but also infer new and correct decisions based on two simple rules, thereby masking possible failures of the central authorization service and reducing the network delays. Our evaluation results suggest that authorization recycling improves the availability and performance of distributed access control solutions.The second approach explores a cooperative authorization recycling system, where each secondary decision point shares its ability to make decisions with others through a discovery service. Our system does not require cooperating secondary decision points to trust each other. To maintain cache consistency at multiple secondary decision points, we propose alternative mechanisms for propagating update messages. Our evaluation results suggest that cooperation further improves the availability and performance of authorization infrastructures.The third approach examines the use of a publish-subscribe channel for delivering authorization requests and responses between policy decision points and enforcement points. By removing enforcement points' dependence on a particular decision point, this approach helps improve system availability, which is confirmed by our analytical analysis, and reduce system administration/development overhead. We also propose several subscription schemes for different deployment environments and study them using a prototype system.We finally show that combining these three approaches can further improve the authorization system availability and performance, for example, by achieving a unified cooperation framework and using speculative authorizations.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

EdgeEngine: A Thermal-Aware Optimization Framework for Edge Inference (2024)

Heterogeneous edge platforms enable the efficient execution of machine learning inference applications. These applications often have a critical constraint (such as meeting a deadline) and an optimization goal (such as minimizing energy consumption). To navigate this space, existing optimization frameworks adjust the platform's frequency configuration for the CPU, the GPU and/or the memory controller. These optimization frameworks, however, are thermal-oblivious disregarding the fact that edge platforms are frequently deployed in environments where they are exposed to ambient temperature variations. In this thesis, we first characterize the impact of ambient temperature on the power consumption and execution time of machine learning inference applications running on a popular edge platform, the NVIDIA Jetson TX2. Our rigorous data collection and statistical methodology reveals a sizeable ambient temperature impact on power consumption (about 20\% on average, and up to 40\% on some workloads) and a moderate impact on runtime (up to 5\%). We also find that existing, thermal-oblivious optimization frameworks select frequency configurations that either violate the application's constraints and/or are sub-optimal in terms of the optimization goal assigned. To address these shortcomings, we propose EdgeEngine, a lightweight thermal-aware optimization framework. EdgeEngine monitors the platform's temperature and uses reinforcement learning to adjust the frequency configuration of all underlying platform resources to meet the application's constraints. We find that EdgeEngine meets the application's constraint, and achieves up to 29\% lower energy consumption (up to 2x) and up to 41\% fewer violations compared to existing state-of-the-art thermal-oblivious optimization frameworks.

View record

Scale-free graph processing on a NUMA machine (2018)

The importance of high-performance graph processing to solve big data problems targeting high-impact applications is greater than ever before. Graphs incur highly irregular memory accesses which leads to poor data locality, load imbalance, and data-dependent parallelism. Distributed graph processing frameworks, such as Google's Pregel, that employs memory-parallel, shared-nothing systems have experienced tremendous success in terms of scale and performance. Modern shared-memory systems embrace the so called Non-Uniform Memory Access (NUMA) architecture which has proven to be more scalable (in terms of numbers of cores and memory modules) than the Symmetric Multiprocessing (SMP) architecture. In many ways, a NUMA system resembles a shared-nothing distributed system: physically distinct processing cores and memory regions (although, cache-coherent in NUMA). Memory accesses to remote NUMA domains are more expensive than local accesses. This poses the opportunity to transfer the know-how and design of distributed graph processing to develop shared-memory graph processing solutions optimized for NUMA systems (which is surprisingly little-explored).In this dissertation, we explore if a distributed-memory like middleware that makes graph partitioning and communication between partitions explicit, can improve the performance on a NUMA system. We design and implement a NUMA aware graph processing framework that treats the NUMA platform as a distributed system, and embraces its design principles; in particular explicit partitioning and inter-partition communication. We further explore design trade-offs to reduce communication overhead and propose a solution that embraces design philosophies of distributed graph processing system and at the same time exploits optimization opportunities specific to single-node systems. We demonstrate up to 13.9x speedup over a state-of-the-art NUMA-aware framework, Polymer and up to 3.7x scalability on a four-socket machine using graphs with tens of billions of edges.

View record

Energy prediction for I/O intensive workflow applications (2014)

As workflow-based data-intensive applications have become increasingly popular, the lack of support tools to aid resource provisioning decisions, to estimate the energy cost of running such applications, or simply to support configuration choices has become increasingly evident. The goal of this thesis is to design techniques and tools to predict the energy consumption of these workflow-based applications, evaluate different optimization techniques from an energy perspective, and explore energy/performance tradeoffs. This thesis proposes a methodology to predict the energy consumption for workflow applications. More concretely, it makes three key contributions: First, it proposes a simple analytical energy consumption model that enables adequately accurate energy consumption predictions. This makes it possible not only to estimate energy consumption but also to reason about the relative benefits different system configuration and provisioning decisions offer. Second, an empirical evaluation of energy consumption is carried out using synthetic benchmarks and real workflow applications. This evaluation quantifies the energy savings of performance optimizations for the distributed storage system as well as the energy and performance impact of power-centric tuning techniques. Third, it demonstrates the predictor’s ability to expose energy performance tradeoffs for the synthetic benchmarks and workflow applications by evaluating the accuracy of the energy consumption predictions. Overall, the prediction obtained an average accuracy of more than 85% and a median of 90% across different scenarios, while using less than 200x less resources than running than actual applications.

View record

GPU hardware acceleration for industrial applications : using computation to push beyond physical limitations (2014)

This thesis explores the possibility of utilizing Graphics Processing Units (GPUs) to address the computational demand of algorithms used to mitigate the inherent physical limitations in devices such as microscopes and 3D-scanners. We investigate the outcome and test our methodology for the following case studies:- the narrow field of view found in microscopes.- the limited pixel-resolution available in active 3D sensing technologies such as laser scanners.The algorithms that offer to mitigate these limitations suffer from high computational requirements, rendering them ineffective for time-sensitive applications. In our methodology we exploit parallel programming and software engineering practices to efficiently harness the GPU's potential to provide the needed computational performance.Our goal is to show that it is feasible to use GPU hardware acceleration to address computational requirements of these algorithms for time-sensitive industrial applications. The results of this work demonstrate the potential for using GPU hardware acceleration in meeting computational requirements of such applications. We achieved twice the performance required to algorithmically extend the narrow field of view in microscopes for micro-pathology, and we reached the performance required to upsample the pixel-resolution of a 3D scanner in real-time, for use in autonomous excavation and collision detection in mining.

View record

Towards a high-performance scalable storage system for workflow applications (2013)

This thesis is motivated by the fact that there is an urgent need to run scientific many-task workflow applications efficiently and easily on large-scale machines. These applications run at large scale on supercomputers and perform large amount of storage I/O. The storage system is identified as the main bottleneck on large-scale computers for many-task workflow applications. The goal of this thesis is to identify the opportunities and recommend solutions to improve the performance of many-task workflow applications. To achieve the above goal this thesis proposes a two-step solution. As the first step, this thesis recommends and designs an intermediate storage system which aggregates the resources available on compute nodes (local disk, SSDs, memory and network) and provides a minimal POSIX API required by workflow applications. An intermediate storage system facilitates a high performance scratch space for workflow applications and allows the applications to scale transparently compare to a regular shared storage systems. As the second step, this thesis performs a limit study on workflow-aware storage system: an intermediate storage that is tuned depending on I/O characteristics of a workflow application. Evaluation with synthetic and real workflow applications highlights the significant performance gain attainable by an intermediate storage system and a workflow-aware storage system. The evaluation shows that an intermediate storage can bring up to 2x performance gain compared to a central storage system. Further a workflow-aware storage system can bring up to 3x performance gain compared to a vanilla distributed storage system that is unaware of the possible file-level optimizations. The findings of this research prove that an intermediate storage system with minimal POSIX API is a promising direction to provide a high-performance scalable storage system for workflow applications. The findings also strongly advocate and provide design recommendations for a workflow-aware storage system to achieve better performance gain.

View record


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Sign up for an information session to connect with students, advisors and faculty from across UBC and gain application advice and insight.