Factors Affecting the Performance of Trainable Models for Software Defect Prediction

Bowes, David Hutchinson

dc.contributor.author	Bowes, David Hutchinson
dc.date.accessioned	2013-06-27T13:14:49Z
dc.date.available	2013-06-27T13:14:49Z
dc.date.issued	2013-06-13
dc.identifier.uri	http://hdl.handle.net/2299/10978
dc.description.abstract	Context. Reports suggest that defects in code cost the US in excess of $50billion per year to put right. Defect Prediction is an important part of Software Engineering. It allows developers to prioritise the code that needs to be inspected when trying to reduce the number of defects in code. A small change in the number of defects found will have a significant impact on the cost of producing software. Aims. The aim of this dissertation is to investigate the factors which a ect the performance of defect prediction models. Identifying the causes of variation in the way that variables are computed should help to improve the precision of defect prediction models and hence improve the cost e ectiveness of defect prediction. Methods. This dissertation is by published work. The first three papers examine variation in the independent variables (code metrics) and the dependent variable (number/location of defects). The fourth and fifth papers investigate the e ect that di erent learners and datasets have on the predictive performance of defect prediction models. The final paper investigates the reported use of di erent machine learning approaches in studies published between 2000 and 2010. Results. The first and second papers show that independent variables are sensitive to the measurement protocol used, this suggests that the way data is collected a ects the performance of defect prediction. The third paper shows that dependent variable data may be untrustworthy as there is no reliable method for labelling a unit of code as defective or not. The fourth and fifth papers show that the dataset and learner used when producing defect prediction models have an e ect on the performance of the models. The final paper shows that the approaches used by researchers to build defect prediction models is variable, with good practices being ignored in many papers. Conclusions. The measurement protocols for independent and dependent variables used for defect prediction need to be clearly described so that results can be compared like with like. It is possible that the predictive results of one research group have a higher performance value than another research group because of the way that they calculated the metrics rather than the method of building the model used to predict the defect prone modules. The machine learning approaches used by researchers need to be clearly reported in order to be able to improve the quality of defect prediction studies and allow a larger corpus of reliable results to be gathered.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Hertfordshire	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	defect prediction	en_US
dc.subject	machine learning	en_US
dc.subject	metrics	en_US
dc.subject	defects	en_US
dc.subject	data quality	en_US
dc.subject	software engineering	en_US
dc.subject	binary classification	en_US
dc.subject	predictive performance	en_US
dc.subject	systematic literature review	en_US
dc.subject	confusion matrix	en_US
dc.title	Factors Affecting the Performance of Trainable Models for Software Defect Prediction	en_US
dc.type	info:eu-repo/semantics/doctoralThesis	en_US
dc.identifier.doi	10.18745/th.10978
dc.identifier.doi	10.18745/th.10978
dc.type.qualificationlevel	Doctoral	en_US
dc.type.qualificationname	PhD	en_US
herts.preservation.rarelyaccessed	true

Files in this item

Name:: 02036809 Bowes David final PhD ...
Size:: 18.12Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

PhD Theses Collection

Show simple item record