Evaluating the Quality and Quantity of Data on Open Source Software Projects
In this paper, we provide a preliminary evaluation of the quality and quantity of data on open source (OS) projects, provided at the SourceForge.net portal. We have derived a dataset of approximately 50000 projects from SourceForge. Using several indicators of project activity, we identify two samples from the entire dataset: the most active OS projects (a total of 456 projects, ~0.9% of the entire dataset), and those projects with active code development (5826 projects, ~11.6%). The number of projects that are active across all of our main indicators of activity account for less than 1% of the projects on the portal. This suggests that many OS projects being registered--on SourceForge are impulse projects, which do not gather sufficient interest from developers or users to activate those projects and make them successful . It also suggests that researchers, developers and users should be careful about how they use OS portals.