Relevant Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Complete these steps before you reach out to a faculty member!
- Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
- Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
- Identify specific faculty members who are conducting research in your specific area of interest.
- Establish that your research interests align with the faculty member’s research interests.
- Read up on the faculty members in the program and the research being conducted in the department.
- Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
- Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
- Do not send non-specific, mass emails to everyone in the department hoping for a match.
- Address the faculty members by name. Your contact should be genuine rather than generic.
- Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
- Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
- Demonstrate that you are familiar with their research:
- Convey the specific ways you are a good fit for the program.
- Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
- Be enthusiastic, but don’t overdo it.
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
Graduate Student Supervision
Doctoral Student Supervision (Jan 2008 - May 2021)
To answer an information need while performing a software task, a software developer sometimes has to interact with a lot of software artifacts. This interaction may involve reading through large amounts of information and many details of artifacts to find relevant information. In this dissertation, we propose the use of automatically generated natural language summaries of software artifacts to help a software developer more efficiently interact with software artifacts while trying to answer an information need. We investigated summarization of bug reports as an example of natural language software artifacts, summarization of crosscutting code concerns as an example of structured software artifacts and multi-document summarization of project documents related to a code change as an example of multi-document summarization of software artifacts. We developed summarization techniques for all the above cases. For bug reports, we used an extractive approach based on an existing supervised summarization system for conversational data. For crosscutting code concerns, we developed an abstractive summarization approach. For multi-document summarization of project documents, we developed an extractive supervised summarization approach. To establish the effectiveness of generated summaries in assisting software developers, the summaries were extrinsically evaluated by conducting user studies. Summaries of bug reports were evaluated in the context of bug report duplicate detection tasks. Summaries of crosscutting code concerns were evaluated in the context of software code change tasks. Multi-document summaries of project documents were evaluated by investigating whether project experts find summaries to contain information describing the reason behind the corresponding code changes. The results show that reasonably accurate natural language summaries can be automatically produced for different types of software artifacts and that the generated summaries are effective in helping developers address their information needs.
During the development of a software system, large amounts of new information, such as source code, work items and documentation, are produced continuously. As a developer works, one of his major activities is to consult portions of this information pertinent to his work to answer the questions he has about the system and its development. Current development environments are centered around models of the artifacts used in development, rather than of the people who perform the work, making it difficult and sometimes infeasible for the developer to satisfy his information needs.We introduce two developer-centric models, the degree-of-knowledge (DOK) model and the information fragments model, which support developers in accessing the small portions of information needed to answer the questions they have. The degree-of-knowledge model computes automatically, for each source code element in the development environment, a real value that represents a developer's knowledge of that element based on a developer's authorship and interaction data. We present evidence that shows that both authorship and interaction information are important in characterizing a developer's knowledge of code. We report on the usage of our model in case studies on expert finding, knowledge transfer and identifying changes of interest. We show that our model improves upon an existing expertise finding approach and can accurately identify changes for which a developer should likely be aware. Finally, we discuss the robustness of the model across multiple development sites and teams.The information fragment model automates the composition of different kinds of information and allows developers to easily choose how to display the composed information. We show that the model supports answering 78 questions that involve the integration of information siloed by existing programming environments. We identified these questions from interviews with developers. We also describe how 18 professional developers were able to use a prototype tool based on our model to successfully and quickly answer 94% of eight of the 78 questions posed in a case study. The separation of composition and presentation supported by the model, allowed the developers to answer the questions according to their personal preferences.
Master's Student Supervision (2010 - 2020)
A software developer works on many tasks per day, frequently switching back and forth between their tasks. This constant churn of tasks makes it difficult for a developer to know the specifics of what tasks they worked on, and when they worked on them. Consequently, activities such as task resumption, planning, retrospection,and reporting become complicated. To help a developer determine which tasks they worked on and when these tasks were performed, we introduce two novel approaches. First, an approach that captures the contents of a developer’s active window at regular intervals to create vector and visual representations of the work in a particular time interval. Second, an approach that automatically detects the times at which developers switch tasks, as well as coarse grained information about the type of the task. To evaluate the first approach, we created a data set with multiple developers working on the same set of six information seeking tasks. To evaluate the second approach, we conducted two field studies, collecting data from a total of 25 professional developers. Our analyses show that our approaches enable: 1) segments of a developer’s work to be automatically associated with a task from a known set of tasks with average accuracy of 70.6%, 2) a visual representation of a segment of work performed such that a developer can recognize the task with average accuracy of 67.9%, 3) the boundaries of a developer’s task to be detected with an accuracy as high as 84%, and 4) the coarse grained type of a task that a developer works on to be detected with 61% accuracy.
Software developers use commits to track source code changes made to a project, and to allow multiple developers to make changes simultaneously. To ensure that the commits can be traced to the issues that describe the work to be performed, developers typically add the identifier of the issue to the commit message to link commits to issues. However, developers are not infallible and not all desirable links are captured manually. To help find and improve links that have been manually specified, several techniques have been created. Although many software engineering tools, like defect predictors, depend on the links between commits and issues, there is currently no way to assess the quality of existing links. To provide a means of assessing the quality of links, I propose two quality attributes: completeness and consistency. Completeness measures whether all appropriate commits link to an issue, and consistency measures whether commits are linked to the most specific issue. I applied these quality attributes to assess a number of existing link techniques and found that existing techniques to link commits to issues lack both completeness and consistency in the links that they created. To enable researchers to better assess their techniques, I built a dataset that improves the link data for two open source projects. In addition, I provide an analysis of information in issue repositories in the form of relationships between issues that might help improve existing link augmentation techniques.
Software provided under open source licenses is widely used, fromforming high-profile stand-alone applications (e.g., Mozilla Firefox)to being embedded in commercial offerings (e.g., network routers).Despite the high frequency of use of open source licenses, there hasbeen little work about whether software developers understand the opensource licenses they use. To helpunderstand whether or not developers understand the open sourcelicenses they use, I conducted a survey that posed developmentscenarios involving three popular open source licenses (GNU GPL 3.0,GNU LGPL 3.0 and MPL 2.0) both alone and in combination. The 375respondents to the survey, who were largely developers, gave answersconsistent with those of a legal expert's opinion in 62% of 42cases. Although developers clearly understood cases involving one license,they struggled when multiple licenses were involved. To understand the context in which licensing issues arise in practice, I analyzed real-world questions posed by developers on online question-and-answer communities. The analysis of these questions indicate that licensing issues can constrain software evolution and technical decisions can have an impact on future licensing issues. Finally, I interviewed software developers in industry to understand how developers reason about and handle license incompatibility in practice. The developers I interviewed are cautious of restrictive licenses. To identify potential licensing issues, these developers rely on licensing guidelines provided by their organization and sometimes use specialized tools to automatically detect licensing issues in their projects. When faced with a situation in which a component that suits their needs is not compatible, developers tend to look for alternative components made available by open source communities. They sometimes leverage the technical architecture of their projects to enable the use of components under restrictive licenses and might rewrite the required functionality if necessary. An analysis ofthe results indicate a need for tool support to help guide developers in understanding the structure of the code and the technical details of a project while taking into account the exact requirements imposed by the licenses involved.
Software developers have many tools at their disposal that use a variety of sophisticated technology, such as static analysis and model checking, to help find defects before software is released. Despite the availability of such tools, software development still relies largely on human inspection of code to find defects. Many software development projects use code reviews as a means to ensure this human inspection occurs before a commit is merged into the system. Known as modern code review, this approach is based on tools, such as Gerrit, that help developers track commits for which review is needed and that help perform reviews asynchronously. As part of this approach, developers are often presented with a list of open code reviews requiring attention. Existing code review tools simply order this list of open reviews based on the last update time of the review; it is left to a developer to find a suitable review on which to work from a long list of reviews. In this thesis, we present an investigation of four algorithms that recommend an ordering of the list of open reviews based on properties of the reviews. We use a simulation study over a dataset of six projects from the Eclipse Foundation to show that an algorithm based on ordering reviews from least lines of code modified in the changes to be reviewed to most lines of code modified out performs other algorithms. This algorithm shows promise for eliminating stagnation of reviews and optimizing the average duration reviews are open.
Software is not built in isolation but builds on other software. When one project relies on software produced by another project, we say there is a technical dependence between the projects. The socio-technical congruence literature suggests that when there is a technical dependence there may need to be a social dependence. We investigate the alignment between social interactions and technical dependence in a software ecosystem.We performed an exploratory study of 250 Java projects on GitHub that use Maven for build dependences. We create a social interaction graph based on developers’ interactions on issue and pull requests. We compare the social interaction graph with a technical dependence graph representing library dependences between the projects in the ecosystem, to get an overview of the congruence, or lack thereof, between social interactions and technical dependences. We found that in 23.6% of the cases in which there is a technical dependence between projects there is also evidence of social interaction between project members. We found that in 8.67% of the cases in which there is a social interaction between project members, there is a technical dependence between projects.To better understand the situations in which there is congruence between the social and technical graphs, we examine pairs of projects that meet this criteria. We identify three categories of these project pairs and provide a quantitative and qualitative comparison of project pairs from each category. We found that for 45 (32%) of project pairs, no social interaction had taken place before the introduction of technical dependence and interactions after the introduction of the dependence are often about upgrading the library being depended upon. For 49 (35%) of project pairs, 75% of the interaction takes place after the introduction of the technical dependence. For the remaining 45 (32%) of project pairs, less than 75% of the interaction takes place after the introduction of the technical dependence. In the latter two cases, although there is interaction before the technical dependence is introduced, it is not always about the dependence.
Despite an abundance of commands to make tasks easier to perform, the users of feature-rich applications, such as development environments and AutoCAD applications, use only a fraction of the commands available due to a lack of awareness of the existence of many commands. Earlier work has shown that command recommendation can improve the usage of a range of commands available within such applications.In this thesis, we address the command recommendation problem, in which, given the command usage history of a set of users, the objective is to predict a command that is likely useful for the user to learn. We investigate two approaches to address the problem.The first approach is built upon the hypothesis that users of feature-rich applications who have similar features tend to use the same commands, and also, a specific user tends to use commands with similar features. Building on this hypothesis, we describe a supervised learning framework that exploits features from a user-command network to predict new links among users and commands. The second approach is built upon three hypotheses. First, we hypothesize that in feature-rich applications there exists co-occurrence patterns between commands. Second, we hypothesize that users of feature-rich applications have prevalent discovery patterns. Finally, we hypothesize that users need different recommendations based on the time elapsed between their last activity and the time of recommendation. To generate recommendations, we obtain co-occurrence and discovery patterns from the command usage history of a large set of users of the same feature-rich application. Subsequently, for each user, we produce recommendations based on the user's command usage history, co-occurrence and discovery patterns, and time elapsed since the last command usage. We refer to the algorithm we developed according to this approach as CoDis.Empirical experiments on data submitted by users of an integrated development environment (Eclipse) demonstrate that CoDis achieves significant performance improvements over link prediction, standard algorithms used for command recommendation, and matrix factorization techniques that are known to perform well in other domains. Compared to ADAGRAD, the best performing baseline, it achieves an improvement of 10.22% in recall, for a top-N recommendation task (N=20).
Recently, evaluation of a recommender system has been beyond evaluating just the algorithm. In addition to accuracy of algorithms, user-centric approaches evaluate a system’s effectiveness in presenting recommendations, explaining recommendations and gaining users’ confidence in the system. Existing research focuses on explaining recommendations that are related to user’s current task. However, explaining recommendations can prove useful even when recommendations are not directly related to user’s current task. Recommendations of development environment commands to soft- ware developers is an example of recommendations that are not related to the user’s current task, which is primarily focussed on programming, rather than inspecting recommendations.In this dissertation, we study three different kinds of explanations for IDE commands recommended to software developers. These explanations are inspired by the common approaches based on literature in the domain. We describe a lab-based experimental study with 24 participants where they performed programming tasks on an open source project. Our results suggest that explanations affect users’ trust of recommendations, and explanations reporting the system’s confidence in recommendation affects their trust more. The explanation with system’s confidence rating of the recommendations resulted in more recommendations being investigated. However, explanations did not affect the uptake of the commands. Our qualitative results suggest that recommendations, when not user’s primary focus, should be in context of his task to be accepted more readily.
The web is an increasingly important source of development-related resources, such as code examples, tutorials, and API documentation. Yet existing integrated development environments do little to assist the developer in finding and utilizing these resources. In this work, we explore how to provide useful web page recommendations to developers by focusing on the problem of refinding previously-visited web pages. We present the results of a formative study, in which we measured how often developers return to code-related web pages, and the methods they use to find those pages. Considering only revisits which occurred at least 15 minutes after the previous visit, and are therefore unlikely to be a consequence of browsing search results, we found a code-related recurrence rate of 13.7%. Only 7.4% of these code-related revisits were initiated through a bookmark of some kind, indicating the majority involved some manual effort to refind. To assist developers with code-related revisits, we developed Reverb, a tool which displays a list of dynamic bookmarks that pertain to the code visible in the editor. Reverb’s bookmarks are generated by building queries from the classes and methods referenced in the local code context and running these queries against a full-text index of the developer’s browsing history, as collected from popular browsers used. We describe Reverb’s implementation and present results from a study in which developers used Reverb while working on their own coding tasks. Our results suggest that local code context can help in making useful recommendations.
Software developers often confront questions such as "Why was the code implemented this way"? To answer such questions, developers make use of information in a software system's bug and source repositories. In this thesis, we consider two user interfaces for helping a developer to explore information from such repositories. One user interface, from Holmes and Begel's Deep Intellisense tool, exposes historical information across several integrated views, favouring exploration from a single code element to all of that element's historical information. The second user interface, in a tool called Rationalizer that we introduce in this thesis, integrates historical information into the source code editor, favouring exploration from a particular code line to its immediate history. We introduce a model to express how software repository information is connected and use this model to compare the two interfaces. Through a laboratory study, we found that our model can help to predict which interface is helpful for two particular kinds of historical questions. We also found deficiencies in the interfaces that hindered users in the exploration of historical information. These results can help inform tool developers who are presenting historical information from software repositories, whether that information is retrieved directly from the repository or derived through software history mining.