Show simple item record

dc.contributor.authorGray, David
dc.contributor.authorSun, Yi
dc.contributor.authorDavey, N.
dc.contributor.authorChristianson, B.
dc.contributor.authorBowes, David
dc.identifier.citationGray , D , Sun , Y , Davey , N , Christianson , B & Bowes , D 2012 , ' Reflections on the NASA MDP data sets ' , IET Software , vol. 6 , no. 6 , pp. 549-558 .
dc.identifier.otherPURE: 1058411
dc.identifier.otherPURE UUID: 0d1078df-8032-475f-b769-d05e343f6367
dc.identifier.otherScopus: 84870449427
dc.description.abstractBackground: The NASA Metrics Data Program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: The bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly due to repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.en
dc.relation.ispartofIET Software
dc.titleReflections on the NASA MDP data setsen
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionScience & Technology Research Institute
dc.contributor.institutionCentre for Computer Science and Informatics Research
dc.description.statusPeer reviewed
dc.relation.schoolSchool of Computer Science
rioxxterms.typeJournal Article/Review

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record