Final Thesis: Exploratory Data Analysis of Code Review Data

Abstract: The code review process is well established and an essential part during the development of open source software. To establish code reviews in open source projects, the process of code review is often supported and monitored by additional review tools like Gerrit. This thesis analyzed review data from Gerrit for some open source projects (i.e. Gromacs, Go and Typo3) with the main focus on the observation of the impact of highly active developers during the review process. Therefore, indicators to identify these active developers during the review process were introduced and used to prepare a selection of active developers. Based on this selection their impact during the review process was analyzed. By applying an appropriate threshold on the introduced indicators, the size of the selected highly active developers (top-k developers) can be customized. The final findings showed, that there is a small selection of top-k developers that have a strong impact on the review process of the open source project they participate in (e.g. 1% of the 2492 contributors in the Typo3 project provide about 70% of the measured review content). In addition to this analysis, the preparation of the review data from Gerrit and a short descriptive overview are presented.

Keywords: code review, network theory, social diffusion

