Abstract: Open source components dominate modern software applications, making license compliance a critical challenge for organizations. While automated processes in Software Composition Analysis (SCA) tools are essential for managing this complexity, they face inherent limitations in accuracy when detecting licenses and copyright statements, managing package metadata, and identifying dependencies. This master’s thesis addresses these limitations by designing and implementing comprehensive data curation capabilities for SCA Tool, enabling human-in-the-loop workflows that combine automated analysis with manual review. The implementation introduces two primary curation workflows: package metadata correction and scanner finding curation. The metadata correction system allows users to override incomplete or incorrect package information at multiple levels. The scanner finding curation functionality provides an intuitive interface for reviewing and correcting license and copyright findings, with support for auditing, bulk operations, hotkeys, and automatic reuse of decisions across identical files. The resulting system enables organizations to achieve the accuracy required for license compliance by efficiently combining automated analysis with targeted human expertise.
PDF: Master Thesis
Reference: Lukas Nehrke. Data Curation in SCA Tool for Improved Software Composition. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2025.
Discover more from Professorship for Open-Source Software
Subscribe to get the latest posts sent to your email.