Final Thesis: Merging and Anonymizing User Identities in Software Development

Abstract: Data extracted from software engineering tools, such as GitHub, GitLab, and Jira, can serve as an important source for metrics and studies. However, to conduct accurate studies, we need to unify the non-uniform identities of developers on those platforms. Additionally, identities need to be anonymized before releasing research results. While there are available solutions, none meet the criteria for dependency management and data anonymization required by the MECOIS project. This thesis presents the design and implementation of an internal identity matching tool that removes the reliance on third-party tools. The prototype implements a three-phase matching algorithm that matches email addresses to discover related profiles, automatically merging high-confidence matches, and flagging uncertain cases for manual review. The system implements a PostgreSQL database schema that uses UUIDs for anonymization and integrates into the existing MECOIS data pipeline. The implementation is flexible, allowing the use of different similarity algorithms without changing the core logic. The prototype has been evaluated against the defined functional and non-functional requirements and shown that the core identity matching functions work as required. The system provides MECOIS with independence from external tools while retaining full control of the identity management and anonymization processes.

Keywords: Analytics, Software-Development, Identity-Management

PDF: Master Thesis

Reference: Alexander Sacharenko. Merging and Anonymizing User Identities in Software Development. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2025.


Discover more from Professorship for Open-Source Software

Subscribe to get the latest posts sent to your email.