Show simple item record

dc.contributor.authorGray, David Philip Harry
dc.date.accessioned2013-07-03T14:22:52Z
dc.date.available2013-07-03T14:22:52Z
dc.date.issued2013-06-24
dc.identifier.urihttp://hdl.handle.net/2299/11067
dc.description.abstractSoftware defect prediction is motivated by the huge costs incurred as a result of software failures. In an effort to reduce these costs, researchers have been utilising software metrics to try and build predictive models capable of locating the most defect-prone parts of a system. These areas can then be subject to some form of further analysis, such as a manual code review. It is hoped that such defect predictors will enable software to be produced more cost effectively, and/or be of higher quality. In this dissertation I identify many data quality and methodological issues in previous defect prediction studies. The main data source is the NASA Metrics Data Program Repository. The issues discovered with these well-utilised data sets include many examples of seemingly impossible values, and much redundant data. The redundant, or repeated data points are shown to be the cause of potentially serious data mining problems. Other methodological issues discovered include the violation of basic data mining principles, and the misleading reporting of classifier predictive performance. The issues discovered lead to a new proposed methodology for software defect prediction. The methodology is focused around data analysis, as this appears to have been overlooked in many prior studies. The aim of the methodology is to be able to obtain a realistic estimate of potential real-world predictive performance, and also to have simple performance baselines with which to compare against the actual performance achieved. This is important as quantifying predictive performance appropriately is a difficult task. The findings of this dissertation raise questions about the current defect prediction body of knowledge. So many data-related and/or methodological errors have previously occurred that it may now be time to revisit the fundamental aspects of this research area, to determine what we really know, and how we should proceed.en_US
dc.language.isoenen_US
dc.publisherUniversity of Hertfordshireen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectsoftware engineeringen_US
dc.subjectmachine learningen_US
dc.subjectdefect predictionen_US
dc.subjectfault predictionen_US
dc.subjectcode metricsen_US
dc.subjectdata qualityen_US
dc.titleSoftware Defect Prediction Using Static Code Metrics: Formulating a Methodologyen_US
dc.typeinfo:eu-repo/semantics/doctoralThesisen_US
dc.identifier.doi10.18745/th.11067
dc.identifier.doi10.18745/th.11067
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhDen_US
herts.preservation.rarelyaccessedtrue


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record