Final Thesis: A Comparison Study of Open Source License Crawler

Abstract: In order to include open source software in a project, a software developer must abide to the license the software is published under. However, there are no clear guidelines for license placement in open source projects. As a result, the location of the relevant licensing text can vary for each project, making the license identification process a difficult task. A proposed solution for this issue are license crawlers designed to search project directories for the licensing information. This thesis aims to evaluate the existing license crawlers for their functionality and performance, in order to find out if a sufficient solution to the problem exists.

To do so, we performed a two phased benchmark with 6 license crawlers and 75 open source projects. Firstly, we determined which of the software tools found the most licenses in a direct competition. Secondly, we evaluated conflict situations in the output of the best performing license crawlers.

Our results show, that FOSSology and Scancode performed the most reliably. Looking at the conflict situations, we also determined that FOSSology made fewer errors in its evaluation. However, we also found that there are four error categories the crawlers are especially susceptible to.

Keywords: license confusion, open source, license conflicts

PDFs: Bachelor Thesis, Work Description

Reference: Thomas Wolter. A Comparison Study of Open Source License Crawler. Bachelor Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2019.