Abstract: This thesis proposes a proof of concept for identifying Development, Enhancement, Maintenance, Protection, and Exploitation (DEMPE) functions through code contributions using machine learning techniques. Utilizing Design Science Research Methodology (DSRM), this study aims to create and assess a deterministic classification engine (artifact) for commit messages. Various supervised machine learning models: Logistic Regression, Random Forest, XG-Boost, Neural Networks, and classifier chains were trained and systematically evaluated. The dataset, sourced from eight selected GitHub repositories, was manually labeled based on the DEMPE function definitions and conventional commit tag guidelines. After labeling, the data were cleaned and analyzed for class imbalance. To address this imbalance and improve model generalization, the Multi-label Synthetic Minority Oversampling Technique (MLSMOTE) was applied. Subsequently, Sentence-BERT (SBERT) using the all-MiniLM-L6-v2 model was employed to generate semantically meaningful vector representations of the commit messages. These embeddings capture the contextual meaning of sentences, enabling the models to learn from the underlying semantics rather than relying solely on surface-level text features of the text. We also developed a command-line interface (CLI) tool to reproduce the results. The tool supports data fetching from sources, data extraction, data preprocessing, model training, and real-time prediction, accommodating both conventional and non-conventional commit messages, thus providing a practical solution for classifying commits. Among the models tested, the Logistic Regression classifier using a One-vs-Rest strategy delivered the best performance, achieving an average F1-score of 0.916 across DEMPE classes.
Keywords: software-analytics, classification, machine-learning
PDF: Master Thesis
Reference: Arni Islam. Deterministic Classification of Accounting Functions in Code Contributions. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2025.
Discover more from Professorship for Open-Source Software
Subscribe to get the latest posts sent to your email.