Evaluating the Quality and Quantity of Data on Open Source Software Projects
In this paper, we provide a preliminary evaluation of the quality and quantity of data on open source (OS) projects, provided at the SourceForge.net portal. We have derived a dataset of approximately 50000 projects from SourceForge. Using several indicators of project activity, we identify two samples from the entire dataset: the most active OS projects (a total of 456 projects, ~0.9% of the entire dataset), and those projects with active code development (5826 projects, ~11.6%). The number of projects that are active across all of our main indicators of activity account for less than 1% of the projects on the portal. This suggests that many OS projects being registered--on SourceForge are impulse projects, which do not gather sufficient interest from developers or users to activate those projects and make them successful . It also suggests that researchers, developers and users should be careful about how they use OS portals.
| Item Type | Conference or Workshop Item (Other) | 
|---|---|
| Date Deposited | 15 May 2025 16:33 | 
| Last Modified | 22 Oct 2025 20:00 | 
- 
            
picture_as_pdf  - 902200.pdf
 - 
            
subject  - Submitted Version