Open Source Research Opinion: Quantitative Process Research

We will be participating in the Computing Community Consortium’s workshop on the future of open source research at UC Irvine. The organizers asked participants to provide opinions on three research areas that warrant further attention. Here is our first one:

1. Quantitative Analyses of Actual Programmer Behavior

We design software development processes and tools based on what we think helps the process and programmers best. The design of a process and of supporting tools then reflects the beliefs of the designers of what holds true in software development. Until the advent of empirical software engineering, it used to be a lot of guesswork of what makes a good process and a good tool. But even early empirical software engineering research found it hard to abstract from one project to another, not to mention to get sufficient data in the first place.

Open source has changed that. We now have extensive data at hand for the analysis of actual software development processes and actual programmer behavior. We should analyse this data on all dimensions and we should compare it with what people previously believed was true about software development (as applied to open source). First, this will give us a better understanding of how things actually work, and from this we can generate new insights and new hypotheses. Second, by comparing actual data with previously held beliefs that influenced process and tool design, we can uncover problems with those designs and improve them.

Various consequences follow: (a) The research community needs a broad and commonly shared database of project data; efforts like FLOSSmole are only the beginning. (b) The software engineering research community needs to brush up its empirical research skills because unearthing beliefs requires new methods previously unknown or unused in computer science. (c) Publication organs need to be more accommodating of research that takes a natural science or social science perspective rather than an engineering perspective. (d) The community needs to find ways to apply the same research efforts to closed source software.