Research Paper: Open Source License Inconsistencies on GitHub

Abstract: Almost all software, open or closed, builds on open source software and therefore needs to comply with the license obligations of the open source code. Not knowing which licenses to comply with poses a legal danger to anyone using open source software. This article investigates the extent of inconsistencies between licenses declared by an open source project at the top level of the repository, and the licenses found in the code. We analysed a sample of 1,000 open source GitHub repositories. We find that about half of the repositories did not fully declare all licenses found in the code. Of these, approximately ten percent represented a permissive vs. copyleft license mismatch. Furthermore, existing tools cannot fully identify licences. We conclude that users of open source code should not only look at the declared licenses of the open source code they intend to use, but rather examine the software to understand its actual licenses.

Keywords: License management, license conflicts

Reference: Thomas Wolter, Ann Barcomb, Dirk Riehle and Nikolay Harutyunyan (2022). Open Source License Inconsistencies on GitHub. ACM Transactions on Software Engineering and Methodology (TOSEM).

The paper is available as a PDF.