Final Thesis: Design of an Open-Source Data Lakehouse Architecture for Software Development Analytics

Abstract: Software engineering involves the use of many tools that inherently generate valuable data. Analyzing this data is challenging because it must be retrieved from various data sources and enriched to provide analytical insights. While specific problems within the software development process have been extensively studied, comparatively less research has focused on building a scalable platform to support software development analytics. This thesis explores the applicability of a modern data lakehouse architecture for software development analytics. Following a design science approach, a modular, scalable, and extensible data lakehouse architecture for software development analytics, based on Apache Spark, Delta Lake, and S3, was developed. The solution builds upon an existing system and extends it into a complete data lakehouse following the Medallion architecture, with a structured Data Vault-based Silver layer and a Star Schema-based Gold layer. Although the developed prototype focused on implementing a single-tenant solution, the thesis discusses in detail how the system can be extended to multi-tenancy. The prototype enhances the existing system of a small development team by adding a Data Vault-based Silver layer, orchestration via Apache Airflow, and an S3-compatible object store. The prototype’s applicability was successfully verified using development data from GitHub, GitLab, and Jira. The developed software architecture was evaluated through a structured walkthrough with an experienced software architect, and documented using the industry-proven arc42 architecture documentation template.

Keywords: software-architecture, software-development, analytics, design-science

PDF: Master Thesis

Reference: Dominic Rouven Fischer. Design of an Open-Source Data Lakehouse Architecture for Software Development Analytics. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2025.


Discover more from Professorship for Open-Source Software

Subscribe to get the latest posts sent to your email.