Show simple item record

dc.contributor.authorGray, D.
dc.contributor.authorBowes, David
dc.contributor.authorDavey, N.
dc.contributor.authorSun, Yi
dc.contributor.authorChristianson, B.
dc.date.accessioned2013-01-10T15:59:14Z
dc.date.available2013-01-10T15:59:14Z
dc.date.issued2011
dc.identifier.citationGray , D , Bowes , D , Davey , N , Sun , Y & Christianson , B 2011 , The misuse of the NASA metrics data program data sets for automated software defect prediction . in Procs 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011) . IET , pp. 96-103 , Proceedings of the 15th International Conference on Evaluation and Assessment in Software Engineering , Durham , United Kingdom , 11/04/11 . https://doi.org/10.1049/ic.2011.0012
dc.identifier.citationconference
dc.identifier.isbn978-1-84919-509-6
dc.identifier.otherPURE: 1406711
dc.identifier.otherPURE UUID: 815aa606-b193-4e3d-ad8d-0ed912023340
dc.identifier.otherScopus: 82955251102
dc.identifier.urihttp://hdl.handle.net/2299/9552
dc.description.abstractBackground: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. Aim: To demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. Method: A meticulously documented data cleansing process involving all 13 of the original NASA data sets. Results: Post our novel data cleansing process; each of the data sets had between 6 to 90 percent less of their original number of recorded values. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: Defect prediction data sets could benefit from lower level code metrics in addition to those more commonly used, as these will help to distinguish modules, reducing the likelihood of repeated data points. Three: The bulk of defect prediction experiments based on the NASA Metrics Data Program data sets may have led to erroneous findings. This is mainly due to repeated data points potentially causing substantial amounts of training and testing data to be identical.en
dc.language.isoeng
dc.publisherIET
dc.relation.ispartofProcs 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011)
dc.rightsOpen
dc.titleThe misuse of the NASA metrics data program data sets for automated software defect predictionen
dc.contributor.institutionSchool of Engineering and Technology
dc.contributor.institutionScience & Technology Research Institute
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionCentre for Computer Science and Informatics Research
dc.relation.schoolSchool of Engineering and Technology
dc.relation.schoolSchool of Computer Science
dc.description.versiontypeFinal Accepted Version
dcterms.dateAccepted2011
rioxxterms.versionAM
rioxxterms.versionofrecordhttps://doi.org/10.1049/ic.2011.0012
rioxxterms.typeOther
herts.preservation.rarelyaccessedtrue
herts.rights.accesstypeOpen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record