Using acceptance tests to predict files changed by programming tasks

Using acceptance tests to predict files changed by programming tasks

Abstract:

In a collaborative development context, conflicting code changes might compromise software quality and developers productivity. To reduce conflicts, one could avoid the parallel execution of potentially conflicting tasks. Although hopeful, this strategy is challenging because it relies on the prediction of the required file changes to complete a task. As predicting such file changes is hard, we investigate its feasibility for BDD (Behaviour-Driven Development) projects, which write automated acceptance tests before implementing features. We develop a tool that, for a given task, statically analyzes the code that automates the tests and infers test-based interfaces (files that could be executed by the tests), approximating files that would be changed by the task. To assess the accuracy of this approximation, we measure precision and recall of test-based interfaces of 513 tasks from 18 Rails projects on GitHub using Cucumber as acceptance test tool. We also compare such interfaces with randomly defined interfaces, interfaces obtained by textual similarity of test specifications with past tasks, and interfaces computed by executing tests. Our results give evidence that, in the specific context of BDD, Cucumber tests might help to predict files changed by tasks. We find that the better the test coverage, the better the predictive power. A hybrid approach for computing test-based interfaces is promising.

Study Setup

In brief, we performed three main search rounds. In the first round, we restricted the project’s maximum number of stars, sorting results by descending order of stars number, hoping to select more meaningful and popular projects. In the second round, we just sorted results by the date of the last update, to analyze first active projects. Finally, we verified the Ruby projects referenced by Cucumber’s site. As a result, we have a set of 61 selected Rails projects that use Cucumber, and a subset of 18 projects that additionally use Simplecov or Coveralls, which are coverage tools.

After obtaining relevant projects that use the tools of interest for our study, we further filter out projects without a number of tasks that contribute with both production code and Cucumber tests. For extracting tasks from a given project, we clone the project repository and search for merge commits (excluding fast-forwarding merges) by using JGit API, sorting them by descending chronological order. We admit merge commits performed until September 30th, 2017. Then we extract two tasks from each merge commit, each one corresponding to one of the merged contributions. As preliminary filtering, we select tasks that change both production and Gherkin test files. Moreover, as a matter of performance while computing interfaces, we discard tasks that exceed 500 commits. The result of this mining phase refines the set of 61 projects we had from the previous phase, discarding 30 projects (14 projects do not contain merge commits and 16 projects do not contain any task that both changes production and Gherkin files). In brief, we have a set of tasks extracted from 31 Rails projects that use Cucumber, and a subset of 15 projects that additionally use the mentioned coverage tools. Besides that, we have a set of tasks for which we possibly can compute other interfaces we study here except DTestI, and a subset of tasks for which we can compute DTestI.

Once we have a tasks set, we move on to the execution phase. First, we identified the acceptance tests of each task as added or modified scenarios by task's commit. Next, we computed TestI according to eight different configurations and TaskI, and we evaluated precision and recall for each task.