Show simple item record

dc.contributor.authorGray, David
dc.contributor.authorSun, Yi
dc.contributor.authorDavey, N.
dc.contributor.authorChristianson, B.
dc.contributor.authorBowes, David
dc.date.accessioned2012-12-18T12:29:37Z
dc.date.available2012-12-18T12:29:37Z
dc.date.issued2012
dc.identifier.citationGray , D , Sun , Y , Davey , N , Christianson , B & Bowes , D 2012 , ' Reflections on the NASA MDP data sets ' , IET Software , vol. 6 , no. 6 , pp. 549-558 . https://doi.org/10.1049/iet-sen.2011.0132
dc.identifier.issn1751-8814
dc.identifier.otherPURE: 1058411
dc.identifier.otherPURE UUID: 0d1078df-8032-475f-b769-d05e343f6367
dc.identifier.otherScopus: 84870449427
dc.identifier.urihttp://hdl.handle.net/2299/9441
dc.description.abstractBackground: The NASA Metrics Data Program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: The bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly due to repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.en
dc.format.extent10
dc.language.isoeng
dc.relation.ispartofIET Software
dc.titleReflections on the NASA MDP data setsen
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionScience & Technology Research Institute
dc.contributor.institutionCentre for Computer Science and Informatics Research
dc.description.statusPeer reviewed
dc.relation.schoolSchool of Computer Science
dcterms.dateAccepted2012
rioxxterms.versionVoR
rioxxterms.versionVoR
rioxxterms.versionofrecordhttps://doi.org/10.1049/iet-sen.2011.0132
rioxxterms.typeJournal Article/Review
herts.preservation.rarelyaccessedtrue


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record