Thomas Pasquier
Relevant Thesis-Based Degree Programs
Graduate Student Supervision
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
In computer systems, provenance graphs describe causal relationships among operating system entities (e.g., processes, files, and sockets) to represent a system's execution history. Provenance-based Intrusion Detection Systems analyze these graphs to identify malicious execution patterns. Despite advances in Provenance-based Intrusion Detection Systems, measurements of detection performance often neglect the quality of detection reports. Prior work either generates coarse-grained alerts or generates fine-grained alerts (e.g., node-level alerts indicating which nodes are suspicious in a graph) with many false positives. This results in security analysts grappling with overwhelming and often irrelevant data, leading to alert fatigue and frequent burnout. To address this issue, we present a node-level detector, PROVNET. Given a provenance graph, PROVNET detects abnormal nodes and generates node-level alerts using a temporal graph autoencoder framework. Subsequently, PROVNET correlates the alerts to mitigate false positives. Based on correlation results, PROVNET then reconstructs the attack subgraphs and generates the detection report to help security analysts investigate the attack execution flow. PROVNET is evaluated against state-of-the-art systems on publicly available datasets, focusing on detection and run-time performance, and robustness. The evaluation results show that PROVNET achieves competitive detection performance compared with other state-of-the-art systems. In addition, the evaluation results demonstrate that PROVNET can perform detection at run-time with low latency, and showcase its robustness against state-of-the-art provenance-based evasion attacks.
View record
Scientists often use complex multistep workflows to computationally analyze data. These workflows might include downloading datasets, installing packages, data processing, model training and evaluating the results. It is difficult to effectively manage and track these computational workflows. The fast-paced and iterative nature of research programming leads these workflows to include unused code, multiple versions of the same script, and untracked dependencies. These issues cause difficulties when researchers try to reproduce code that someone else has written, or even code that they have written themselves. Research programmers can address these problems by collecting data provenance: a record of what happened during an experiment, including files touched, execution order, and software dependencies. Provenance provides a record of experiment execution, but provenance graphs are often large and complicated, and quickly become incomprehensible. We propose a new method for summarizing provenance graphs using recent advances in prompting large language models. We use large language model prompting to develop textual summaries of provenance graphs. We perform a user study to compare textual summaries to traditional node-link diagrams for experiment reproduction tasks. Our results show that textual summaries are a promising approach to summarizing provenance for experiment reproduction. We use qualitative results from the user study to motivate future designs for reproducibility tools.
View record
If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.