Matei Ripeanu


Research Classification

Computer Systems

Research Interests

distributed systems
High performance computing
Storage Systems
Data Analytics
Graph Analytics
Social Networks

Relevant Degree Programs



Doctoral students
Postdoctoral Fellows
I support public scholarship, e.g. through the Public Scholars Initiative, and am available to supervise students and Postdocs interested in collaborating with external partners as part of their research.
I am open to hosting Visiting International Research Students (non-degree, up to 12 months).

Complete these steps before you reach out to a faculty member!

Check requirements
  • Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
  • Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Requirements" or on the program website.
Focus your search
  • Identify specific faculty members who are conducting research in your specific area of interest.
  • Establish that your research interests align with the faculty member’s research interests.
    • Read up on the faculty members in the program and the research being conducted in the department.
    • Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
  • Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
    • Do not send non-specific, mass emails to everyone in the department hoping for a match.
    • Address the faculty members by name. Your contact should be genuine rather than generic.
  • Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
  • Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
  • Demonstrate that you are familiar with their research:
    • Convey the specific ways you are a good fit for the program.
    • Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
  • Be enthusiastic, but don’t overdo it.
Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.


Graduate Student Supervision

Doctoral Student Supervision (Jan 2008 - Nov 2019)
Approaches for building error resilient applications (2020)

Transient hardware faults have become one of the major concerns affecting the reliability of modern high-performance computing (HPC) systems. They can cause failure outcomes for applications, such as crashes and silent data corruptions (SDCs) (i.e. the application produces an incorrect output). To mitigate the impact of these failures, HPC applications need to adopt fault tolerance techniques.The most common practices of fault tolerance techniques include (i) characterization techniques, such as fault injection and architectural vulnerability factor (AVF)/program vulnerability factor (PVF) analysis; (ii) run-time error detection techniques; and (iii) error recovery techniques. However, these approaches have the following shortcomings: (i) fault injections are generally time-consuming andlack predictive power, while the AVF/PVF analysis offers low accuracy; (ii) prior techniques often do not fully exploit the program’s error resilience characteristics; and (iii) the application constantly pays a performance/storage overhead.This dissertation proposes comprehensive approaches to improve the above techniques in terms of effectiveness and efficiency. In particular, this dissertation makes the following contributions: First, it proposes ePVF, a methodology that distinguishes crash-causing bits from the architecturally correct execution (ACE) bits and obtains a closer estimate of the SDC rate than PVF analysis (by 45% to 67%). To reduce the overall analysis time, it samples representative patterns from ACE bits and obtains a good approximation (less than 1% error) for the overall prediction. This dissertation applies the ePVF methodology to error detection, which leads to a 30% lower SDC rate than well-accepted hot-path instruction duplication.Second, this dissertation combines the roll-forward recovery and the roll-back recovery schemes and demonstrates the improvement in the overall efficiency of the C/R with two systems: LetGo (for faults affecting computational components) and BonVoision (for faults affecting DRAM memory). Overall, LetGo is able to elide 62% of the crashes caused by computational faults and convert them to continued execution (out of these 80% result in correct output while a majority of the rest fall back on the traditional roll-back recovery technique). BonVoision is able to continue to completion 30% of the DRAM memory detectable but uncorrectable errors (DUEs).

View record

Pattern matching in massive metadata graphs at scale (2020)

Pattern matching in graphs, that is finding subgraphs that match a smaller template graph within the large background graph is fundamental to graph analysis and serves a rich set of applications. Unfortunately, existing solutions have limited scalability, are difficult to parallelize, support only a limited set of search patterns, and/or focus on only a subset of the real-world problems. This dissertation explores avenues toward designing a scalable solution for subgraph pattern matching. In particular, this work targets practical pattern matching scenarios in large-scale metadata graphs (also known as property graphs) and designs solutions for distributed memory machines that address the two categories of matching problems, namely, exact and approximate matching. This work presents a novel algorithmic pipeline that bases pattern matching on constraint checking. The key intuition is that each vertex or edge participating in a match has to meet a set of constraints specified by the search template. The pipeline iterates over these constraints to eliminate all the vertices and edges that do not participate in any match, and reduces the background graph to the complete set of only the matching vertices and edges. Additional analysis can be performed on this reduced graph, such as full match enumeration. Furthermore, a vertex-centric formulation for this constraint checking algorithm exists, and this makes it possible to harness existing high-performance, vertex-centric graph processing frameworks.The key contributions of this dissertation are solution design following this constraint checking approach for exact and a class of edit-distance based approximate matching, and experimental evaluation to demonstrate effectiveness of the respective solutions. To this end, this work presents design and implementation of distributed vertex-centric, asynchronous algorithms that guarantee a solution with 100% precision and 100% recall for arbitrary search templates.Through comprehensive evaluation, this work provides evidence that the scalability and performance advantages of the proposed approach are significant. The highlights are scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, and at scales (1,024 compute nodes), orders of magnitude larger than used in the past for similar problems.

View record

Accelerating irregular applications on parallel hybrid platforms (2015)

Future high-performance computing systems will be hybrid; they will include processors optimized for sequential processing and massively-parallel accelerators. Platforms based on Graphics Processing Units (GPUs) are an example of this hybrid architecture, they integrate commodity CPUs and GPUs. This architecture promises intriguing opportunities: within the same dollar or energy budget, GPUs offer a significant increase in peak processing power and memory bandwidthcompared to traditional CPUs, and are, at the same time, generally-programmable.The adoption of GPU-based platforms, however, faces a number of challenges, including the characterization of time/space/power tradeoffs, the development of new algorithms that efficiently harness the platform and abstracting the accelerators in a generic yet efficient way to simplify the task of developing applications on such hybrid platforms.This dissertation explores solutions to the abovementioned challenges in the context of an important class of applications, namely irregular applications.Compared to regular applications, irregular applications have unpredictable memory access patterns and typically use reference-based data structures, such as trees or graphs; moreover, new applications in this class operate on massive datasets.Using novel workload partitioning techniques and by employing datastructures that better match the hybrid platform characteristics, this work demonstrates that significant performance gains, in terms of both time to solution and energy, can be obtained when partitioning the irregular workload to be processed concurrently on the CPU and the GPU.

View record

Security analysis of malicious socialbots on the web (2015)

The open nature of the Web, online social networks (OSNs) in particular, makes it possible to design socialbots—automationsoftware that controls fake accounts in a target OSN, and has the ability to perform basic activities similar to those of real users. In the wrong hands, socialbots can be used to infiltrate online communities, build up trust over time, and then engage in various malicious activities.This dissertation presents an in-depth security analysis of malicious socialbots on the Web, OSNs in particular. The analysis focuses on two main goals: (1) to characterize and analyze the vulnerability of OSNs to cyber attacks by malicious socialbots, social infiltration in particular, and (2) to design and evaluate a countermeasure to efficiently and effectively defend against socialbots.To achieve these goals, we first studied social infiltration as an organized campaign operated by a socialbot network (SbN)—a group of programmable socialbots that are coordinated by an attacker in a botnet-like fashion. We implemented a prototypical SbN consisting of 100 socialbots and operated it on Facebook for 8 weeks. Among various findings, we observed that some users are more likely to become victims than others, depending on factors related to their social structure. Moreover, we found that traditional OSN defenses are not effective at identifying automated fake accounts or their social infiltration campaigns.Based on these findings, we designed Íntegro—an infiltration-resilient defense system that helps OSNs detect automated fake accounts via a user ranking scheme. In particular, Íntegro relies on a novel approach that leverages victim classification for robust graph-based fake account detection, with provable security guarantees. We implemented Íntegro on top of widely-used, open-source distributed systems, in which it scaled nearly linearly. We evaluated Íntegro against SybilRank—the state-of-the-art in graph-based fake account detection—using real-world datasets and a large-scale, production-class deployment at Tuenti, the largest OSN in Spain with more than 15 million users. We showed that Íntegro significantly outperforms SybilRank in ranking quality, allowing Tuenti to detect at least 10 times more fake accounts than their current abuse detection system.

View record

Support for configuration and provisioning of intermediate storage systems (2015)

This dissertation focuses on supporting the provisioning and configuration of distributed storage systems in clusters of computers that are designed to provide a high performance computing platform for batch applications. These platforms typically offer a centralized persistent backend storage system. To avoid the potential bottleneck of accessing the platform's backend storage system, intermediate storage systems aggregate resources allocated to the application to provide a shared temporary storage space dedicated to the application execution. Configuring an intermediate storage system, however, becomes increasingly complex. As a distributed storage system, intermediate storage can employ a wide range of storage techniques that enable workload-dependent trade-offs over interrelated success metrics such as response time, throughput, storage space, and energy consumption. Because it is co-deployed with the application, it offers the user the opportunity to tailor its provisioning and configuration to extract the maximum performance from the infrastructure. For example, the user can optimize the performance by deciding the total number of nodes of an allocation, splitting these nodes, or not, between the application and the intermediate storage, and choosing the values for several configuration parameters for storage techniques with different trade-offs.This dissertation targets the problem of supporting the configuration and provisioning of intermediate storage systems in the context of workflow-based scientific applications that communicate via files -- also known as many-task computing -- as well as checkpointing applications. Specifically, this study proposes performance prediction mechanisms to estimate performance of overall application or storage operations (e.g., an application turn-around time, application's energy consumption, or response time of write operations). By relying on the target application's characteristics, the proposed mechanisms can accelerate the exploration of the configuration space. The mechanisms use monitoring information available at the application level, not requiring changes to the storage system nor specialized monitoring systems. The effectiveness of these mechanisms is evaluated in a number of scenarios -- including different system scale, hardware platforms, and configuration choices. Overall, the mechanisms provide accuracy high enough to support the user's decisions about configuration and provisioning the storage system, while being 200x to 2000x less resource-intensive than running the actual applications.

View record

Quantifying the value of peer-produced Information in social tagging systems (2014)

Commons-based peer production systems are marked by three main characteristics, they are: radically decentralized, non-proprietary, and collaborative. Peer production is in stark contrast to market-based production and/or on a centralized organization (e.g., carpooling vs. car rental; couch surfing vs. hotels; Wikipedia vs. Encyclopedia Britannica). Social tagging systems represent a class of web systems, where peer production is central in their design. In these systems, decentralized users collect, share, and annotate (or tag) content collaboratively to produce a public pool of annotated content. This uncoordinated effort helps filling the demand for labeling an ever increasing amount of user-generated content on the web with textual information. Moreover, these labels (or simply tags) can be valuable as input to mechanisms such as personalized search or content promotion. Assessing the value of individuals contributions to peer production systems is key to design user incentives to bring high quality contributions. However, quantifying the value of peer-produced information such as tags is intrinsically challenging, as the value of information is inherently contextual and multidimensional. This research aims to address these two issues in the context of social tagging systems. To this end, this study sets forth the following hypothesis: assessing the value of peer-produced information in social tagging systems can be achieved by harnessing context and user behavior characteristics. The following questions guide the investigations. Characterization: (Q1). What are the characteristics of individual user activity? (Q2). What are the characteristics of social user activity? (Q3). What are the aspects that influence users perception of tag value? Design: (Q4). How to assess the value of tags for exploratory search? (Q5). What is the value of peer-produced information for content promotion? This study applies a mixed methods approach. The findings show that patterns of user activity can inform the design of supporting mechanisms for tagging systems. Moreover, the results suggest that the proposed method to assess value of tags is able to differentiate between valuable tags from less valuable tags, as perceived by users. Moreover, the analysis of the value of peer-produced informationfor content promotion shows that peer-produced sources can oftentimes outperform expert-produced sources.

View record

Embracing diversity : optimizing distributed storage systems for diverse deployment environments (2013)

Distributed storage system middleware acts as a bridge between the upper layer applications, and the lower layer storage resources available in the deployment platform. Storage systems are expected to efficiently support the applications’ workloads while reducing the cost of the storage platform. In this context, two factors increase the complexity of the design of storage systems: First, the applications’ workloads are diverse among number of axes: read/write access patterns, data compressibility, and security requirements to mention only a few. Second, storage system should provide high performance within a certain dollar budget. This dissertation addresses two interrelated issues in this design space. First, can the computational power of the commodity massively multicore devices be exploited to accelerate storage system operations without increasing the platform cost? Second, is it possible to build a storage system that can support a diverse set of applications yet can be optimized for each one of them?This work provides evidence that, for some system designs and workloads, significant performance gains are brought by exploiting massively multicore devices and by optimizing the storage system for a specific application. Further, my work demonstrates that these gains are possible while still supporting the POSIX API and without requiring changes to the application. Finally, while these two issues can be addressed independently, a system that includes solutions to both of them enables significant synergies.

View record

Towards improving the availability and performance of enterprise authorization systems (2009)

Authorization protects application resources by allowing only authorized entities to access them. Existing authorization solutions are widely based on the request-response model, where a policy enforcement point intercepts application requests, obtains authorization decisions from a remote policy decision point, and enforces those decisions. This model enables sharing the decision point as an authorization service across multiple applications. But, with many requests and resources, using a remote shared decision point leads to increased latency and presents the risk of introducing a bottleneck and/or a single point of failure. This dissertation presents three approaches to addressing these problems.The first approach introduces and evaluates the mechanisms for authorization recycling in role-based access control systems. The algorithms that support these mechanisms allow a local secondary decision point to not only reuse previously-cached decisions but also infer new and correct decisions based on two simple rules, thereby masking possible failures of the central authorization service and reducing the network delays. Our evaluation results suggest that authorization recycling improves the availability and performance of distributed access control solutions.The second approach explores a cooperative authorization recycling system, where each secondary decision point shares its ability to make decisions with others through a discovery service. Our system does not require cooperating secondary decision points to trust each other. To maintain cache consistency at multiple secondary decision points, we propose alternative mechanisms for propagating update messages. Our evaluation results suggest that cooperation further improves the availability and performance of authorization infrastructures.The third approach examines the use of a publish-subscribe channel for delivering authorization requests and responses between policy decision points and enforcement points. By removing enforcement points' dependence on a particular decision point, this approach helps improve system availability, which is confirmed by our analytical analysis, and reduce system administration/development overhead. We also propose several subscription schemes for different deployment environments and study them using a prototype system.We finally show that combining these three approaches can further improve the authorization system availability and performance, for example, by achieving a unified cooperation framework and using speculative authorizations.

View record

Master's Student Supervision (2010 - 2018)
Scale-free graph processing on a NUMA machine (2018)

The importance of high-performance graph processing to solve big data problems targeting high-impact applications is greater than ever before. Graphs incur highly irregular memory accesses which leads to poor data locality, load imbalance, and data-dependent parallelism. Distributed graph processing frameworks, such as Google's Pregel, that employs memory-parallel, shared-nothing systems have experienced tremendous success in terms of scale and performance. Modern shared-memory systems embrace the so called Non-Uniform Memory Access (NUMA) architecture which has proven to be more scalable (in terms of numbers of cores and memory modules) than the Symmetric Multiprocessing (SMP) architecture. In many ways, a NUMA system resembles a shared-nothing distributed system: physically distinct processing cores and memory regions (although, cache-coherent in NUMA). Memory accesses to remote NUMA domains are more expensive than local accesses. This poses the opportunity to transfer the know-how and design of distributed graph processing to develop shared-memory graph processing solutions optimized for NUMA systems (which is surprisingly little-explored).In this dissertation, we explore if a distributed-memory like middleware that makes graph partitioning and communication between partitions explicit, can improve the performance on a NUMA system. We design and implement a NUMA aware graph processing framework that treats the NUMA platform as a distributed system, and embraces its design principles; in particular explicit partitioning and inter-partition communication. We further explore design trade-offs to reduce communication overhead and propose a solution that embraces design philosophies of distributed graph processing system and at the same time exploits optimization opportunities specific to single-node systems. We demonstrate up to 13.9x speedup over a state-of-the-art NUMA-aware framework, Polymer and up to 3.7x scalability on a four-socket machine using graphs with tens of billions of edges.

View record

Energy prediction for I/O intensive workflow applications (2014)

As workflow-based data-intensive applications have become increasingly popular, the lack of support tools to aid resource provisioning decisions, to estimate the energy cost of running such applications, or simply to support configuration choices has become increasingly evident. The goal of this thesis is to design techniques and tools to predict the energy consumption of these workflow-based applications, evaluate different optimization techniques from an energy perspective, and explore energy/performance tradeoffs. This thesis proposes a methodology to predict the energy consumption for workflow applications. More concretely, it makes three key contributions: First, it proposes a simple analytical energy consumption model that enables adequately accurate energy consumption predictions. This makes it possible not only to estimate energy consumption but also to reason about the relative benefits different system configuration and provisioning decisions offer. Second, an empirical evaluation of energy consumption is carried out using synthetic benchmarks and real workflow applications. This evaluation quantifies the energy savings of performance optimizations for the distributed storage system as well as the energy and performance impact of power-centric tuning techniques. Third, it demonstrates the predictor’s ability to expose energy performance tradeoffs for the synthetic benchmarks and workflow applications by evaluating the accuracy of the energy consumption predictions. Overall, the prediction obtained an average accuracy of more than 85% and a median of 90% across different scenarios, while using less than 200x less resources than running than actual applications.

View record

GPU hardware acceleration for industrial applications (2014)

No abstract available.

Towards a high-performance scalable storage system for workflow applications (2013)

This thesis is motivated by the fact that there is an urgent need to run scientific many-task workflow applications efficiently and easily on large-scale machines. These applications run at large scale on supercomputers and perform large amount of storage I/O. The storage system is identified as the main bottleneck on large-scale computers for many-task workflow applications. The goal of this thesis is to identify the opportunities and recommend solutions to improve the performance of many-task workflow applications. To achieve the above goal this thesis proposes a two-step solution. As the first step, this thesis recommends and designs an intermediate storage system which aggregates the resources available on compute nodes (local disk, SSDs, memory and network) and provides a minimal POSIX API required by workflow applications. An intermediate storage system facilitates a high performance scratch space for workflow applications and allows the applications to scale transparently compare to a regular shared storage systems. As the second step, this thesis performs a limit study on workflow-aware storage system: an intermediate storage that is tuned depending on I/O characteristics of a workflow application. Evaluation with synthetic and real workflow applications highlights the significant performance gain attainable by an intermediate storage system and a workflow-aware storage system. The evaluation shows that an intermediate storage can bring up to 2x performance gain compared to a central storage system. Further a workflow-aware storage system can bring up to 3x performance gain compared to a vanilla distributed storage system that is unaware of the possible file-level optimizations. The findings of this research prove that an intermediate storage system with minimal POSIX API is a promising direction to provide a high-performance scalable storage system for workflow applications. The findings also strongly advocate and provide design recommendations for a workflow-aware storage system to achieve better performance gain.

View record

Optimizing re-evaluation of malware distribution networks (2011)

The retrieval and analysis of malicious content is an essential task for security researchers. Security labs use automated HTTP clients known as client honeypots to visit hundreds of thousands of suspicious URLs daily. The dynamic nature of malware distribution networks necessitate periodic re-evaluation of a subset of the confirmed malicious sites, which introduces two problems: 1) the number of URLs requiring re-evaluation exhaust available resources, and 2) repeated evaluation exposes the system to adversarial blacklisting, which affects the accuracy of the content collected. To address these problems, I propose optimizations to the re-evaluation logic that reduce the number of re-evaluations while maintaining a constant sample discovery rate during URLs re-evaluation. I study these problems in two adversarial scenarios: 1) monitoring malware repositories where no provenance is available, and 2) monitoring Fake Anti-Virus (AV) distribution networks. I perform a study of the adversary by repeatedly content from the distribution networks. This reveals trends in the update patterns and lifetimes of the distribution sites and malicious executables. Using these observations I propose optimizations to reduce the amount of re-evaluations necessary to maintain a high malicious sample discovery rate. In the first scenario the proposed techniques, when evaluated versus a fixed interval scheduler, are shown to reduce the number of re-evaluations by 80-93% (assuming a re-evaluation interval of 1 hour to 1 day) with a corresponding impact on sample discovery rate of only 2-7% percent. In the second scenario, optimizations proposed are shown to reduce fetch volume by orders of magnitude and, more importantly, reduce the likelihood of blacklisting.During direct evaluation of malware repositories I observe multiple instances of blacklisting, but on the whole, less than 1% of the repositories studied show evidence of blacklisting. Fake AV distribution networks actively blacklist IPs; I encountered repeated occurrences of IP blacklisting while monitoring Fake AV distribution networks.

View record


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Learn about our faculties, research, and more than 300 programs in our 2021 Graduate Viewbook!