Final Thesis: Measuring Patch-Flow on Github

Abstract: For Open Source Software ( OS ) projects, collaboration is a key to success, as less collaboration between projects leads to projects with less progress. Patches from other OS projects provide the projects with a higher code quality or functionality. In literature, several papers examine the extent of collaboration on OS projects. Yet, most of these studies do not cover the collaboration between different projects. To understand the collaboration between OS projects, Source Code Management ( SCM ) repositories are an essential source. Between repositories exists a connection by patches, which can be obtained by data mining the projects repositories. The measurement of the connection by patches is very difficult, because the information about where the patches go and where they come from is not stored within a repository. Collaboration between OS projects can be expressed as so called Patch Flow. As an example for the OS world, I use GitHub.com as data source. I present to which extent Patch Flow exists between repositories and what circumstances influence Patch Flow. Further, I introduce a model which represents the Patch Flow in detail. Based on this model, I developed a crawler to collect data from the GitHub.com repositories. The analysis of the gathered data shows, that Patch Flow between OS projects exists. Numbers suggest, that collaboration among projects is common in OS projects.

Keywords: Measuring collaboration, mining software repositories, software analytics, open source

PDFs: Master Thesis, Work Description

Reference: Manuel Frederic Zerpies. Measuring Patch-Flow on Github. Master Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2015.