Abstract: The problem of estimating the costs of a software development project has been a persistent challenge in the field. This is particularly relevant in projects developed by multiple contributors within companies (Inner Source), where costs must be allocated across multiple cost centres for management and for tax and transfer-pricing documentation. Simple proxies, such as the number of commits or lines of code alone, are insufficient to capture the actual effort involved. This thesis integrates a previously evaluated algorithm that uses Git repository metadata to estimate the time invested per commit into the MECOIS research project and extends it with the following: disassembly of squashed commits; filtering of no-effort commits and statistical outliers; a regression from changed lines of code to contribution time; support for co-authors; and time zone normalization. The extensions are evaluated against the base version using distributional statistics and an internal plausibility check, which identifies and caps implausibly large contribution times assigned to commits. The improved version reduces extreme estimates substantially: the median contribution time and the standard derivation per commit drop significantly for both of the two tested public repositories. The plausibility-cap events are reduced by more than 24 %. These improvements yield more robust and reliable allocations for the time invested per commit, strengthening the evidential basis for cost calculation and transfer pricing.
PDF: Master Thesis
Reference: Mathieu Stenzel. Estimating Software Contribution Time from Git and GitHub Metadata. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2026.
Discover more from Professorship for Open-Source Software
Subscribe to get the latest posts sent to your email.