Show simple item record

dc.contributor.authorBowes, David Hutchinson
dc.date.accessioned2013-06-27T13:14:49Z
dc.date.available2013-06-27T13:14:49Z
dc.date.issued2013-06-13
dc.identifier.urihttp://hdl.handle.net/2299/10978
dc.description.abstractContext. Reports suggest that defects in code cost the US in excess of $50billion per year to put right. Defect Prediction is an important part of Software Engineering. It allows developers to prioritise the code that needs to be inspected when trying to reduce the number of defects in code. A small change in the number of defects found will have a significant impact on the cost of producing software. Aims. The aim of this dissertation is to investigate the factors which a ect the performance of defect prediction models. Identifying the causes of variation in the way that variables are computed should help to improve the precision of defect prediction models and hence improve the cost e ectiveness of defect prediction. Methods. This dissertation is by published work. The first three papers examine variation in the independent variables (code metrics) and the dependent variable (number/location of defects). The fourth and fifth papers investigate the e ect that di erent learners and datasets have on the predictive performance of defect prediction models. The final paper investigates the reported use of di erent machine learning approaches in studies published between 2000 and 2010. Results. The first and second papers show that independent variables are sensitive to the measurement protocol used, this suggests that the way data is collected a ects the performance of defect prediction. The third paper shows that dependent variable data may be untrustworthy as there is no reliable method for labelling a unit of code as defective or not. The fourth and fifth papers show that the dataset and learner used when producing defect prediction models have an e ect on the performance of the models. The final paper shows that the approaches used by researchers to build defect prediction models is variable, with good practices being ignored in many papers. Conclusions. The measurement protocols for independent and dependent variables used for defect prediction need to be clearly described so that results can be compared like with like. It is possible that the predictive results of one research group have a higher performance value than another research group because of the way that they calculated the metrics rather than the method of building the model used to predict the defect prone modules. The machine learning approaches used by researchers need to be clearly reported in order to be able to improve the quality of defect prediction studies and allow a larger corpus of reliable results to be gathered.en_US
dc.language.isoenen_US
dc.publisherUniversity of Hertfordshireen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectdefect predictionen_US
dc.subjectmachine learningen_US
dc.subjectmetricsen_US
dc.subjectdefectsen_US
dc.subjectdata qualityen_US
dc.subjectsoftware engineeringen_US
dc.subjectbinary classificationen_US
dc.subjectpredictive performanceen_US
dc.subjectsystematic literature reviewen_US
dc.subjectconfusion matrixen_US
dc.titleFactors Affecting the Performance of Trainable Models for Software Defect Predictionen_US
dc.typeinfo:eu-repo/semantics/doctoralThesisen_US
dc.identifier.doi10.18745/th.10978
dc.identifier.doi10.18745/th.10978
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhDen_US
herts.preservation.rarelyaccessedtrue


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record