Final Thesis: A Study and Analysis of the Performance of the JValue Open Data Service as Part of a Data Pipeline Supporting An Online Learning Model
Abstract: Open data has been known for having data quality issues that require complex data cleansing and data transformation in order to be usable for data analysis, data visualization, training machine learning algorithms, and other data science activities. Open Data Service (ODS) is a software project that aims at creating an interface for reliable and safe consumption of open data. It does so by providing the necessary tooling and infrastructure needed for collaboration on eliminating open data usability obstacles. ODS underwent several cycles of development to better serve its purposes, which include functioning as an extract, transform, load (ETL) tool to consume open data from different sources and adapt it to different needs. In this work we evaluate and analyze ODS performance in that regard. Specifically, as part of a data pipeline supporting a real-world data science application.
PDF: Master Thesis
Reference: Shady Hegazy. Study and Analysis of the Performance of JValue Open Data Service as Part of a Data Pipeline Supporting An Online Learning Model. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.