Final Thesis: Integrating Open Data License Information into Data Pipelines

Abstract: Today a lot of open data is available on the internet. To utilize this information to its full potential, it is often necessary to combine multiple datasets into new datasets. An important part of this is understanding and following the requirements of the licenses attached to them. This thesis presents a solution that can be used to integrate open data license information into a data pipeline. Following design science, we construct a framework, that can be used to model and compare open data licenses. To achieve this, existing frameworks and solutions are compared and analyzed to find a solution fitting to open data licenses. The framework includes the functionality to check two licenses for compatibility, aggregate them into a composite license and give recommendations in case the composite license matches an existing license. The thesis also includes an implementation of the framework in the form of an extensible library and a demonstration, based on a website-prototype, on how the framework can be utilized to generate reports about different data-licenses. The framework is evaluated by comparing the results against existing frameworks from literature and testing if other licenses could be included in it as well.

Keywords: Open Data, Open Data Licenses, Data Engineering, JValue

PDF: Master Thesis

Reference: Philip Rebbe. Integrating Open Data License Information into Data Pipelines. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2024.