Research Paper: Challenges to Open Collaborative Data Engineering

Abstract: Open data is data that can be used, modified, and passed on, for free, similar to open-source software. Unlike open-source, however, there is little collaboration in open data engineering. We perform a systematic literature review of collaboration systems in open data, specifically for data engineering by users, taking place after data has been made available as open data. The results show that open data users perform a wide range of activities to acquire, understand, process and maintain data for their projects without established best practices or standardized tools for open collaboration. We identify and discuss technical, community, and process challenges to collaboration in data engineering for open data.

Keywords: Collaboration, data engineering, open data, literature review.

Reference: Philip Heltweg and Dirk Riehle. 2023. Challenges to Open Collaborative Data Engineering. In Proceedings of the 56th Hawaii International Conference on System Sciences (HICSS 2023), forthcoming. Maui, USA.

A preprint of the paper is available as PDF.