Improving Defect Prediction Models by Combining Classifiers Predicting Different Defects

Petric, Jean

dc.contributor.author	Petric, Jean
dc.date.accessioned	2021-02-25T11:27:55Z
dc.date.available	2021-02-25T11:27:55Z
dc.date.issued	2018-10-02
dc.identifier.uri	http://hdl.handle.net/2299/23943
dc.description.abstract	Background: The software industry spends a lot of money on finding and fixing defects. It utilises software defect prediction models to identify code that is likely to be defective. Prediction models have, however, reached a performance bottleneck. Any improvements to prediction models would likely yield less defects-reducing costs for companies. Aim: In this dissertation I demonstrate that different families of classifiers find distinct subsets of defects. I show how this finding can be utilised to design ensemble models which outperform other state-of-the-art software defect prediction models. Method: This dissertation is supported by published work. In the first paper I explore the quality of data which is a prerequisite for building reliable software defect prediction models. The second and third papers explore the ability of different software defect prediction models to find distinct subsets of defects. The fourth paper explores how software defect prediction models can be improved by combining a collection of classifiers that predict different defective components into ensembles. An additional, non-published work, presents a visual technique for the analysis of predictions made by individual classifiers and discusses some possible constraints for classifiers used in software defect prediction. Result: Software defect prediction models created by classifiers of different families predict distinct subsets of defects. Ensembles composed of classifiers belonging to different families outperform other ensemble and standalone models. Only a few highly diverse and accurate base models are needed to compose an effective ensemble. This ensemble can consistently predict a greater number of defects compared to the increase in incorrect predictions. Conclusion: Ensembles should not use the majority-voting techniques to combine decisions of classifiers in software defect prediction as this will miss correct predictions of classifiers which uniquely identify defects. Some classifiers could be less successful for software defect prediction due to complex decision boundaries of defect data. Stacking based ensembles can outperform other ensemble and stand-alone techniques. I propose new possible avenues of research that could further improve the modelling of ensembles in software defect prediction. Data quality should be explicitly considered prior to experiments for researchers to establish reliable results.	en_US
dc.language.iso	en	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/	*
dc.subject	software defect prediction	en_US
dc.subject	prediction modelling	en_US
dc.subject	machine learning	en_US
dc.subject	static code analysis	en_US
dc.title	Improving Defect Prediction Models by Combining Classifiers Predicting Different Defects	en_US
dc.type	info:eu-repo/semantics/doctoralThesis	en_US
dc.identifier.doi	doi:10.18745/th.23943	*
dc.identifier.doi	10.18745/th.23943
dc.type.qualificationlevel	Doctoral	en_US
dc.type.qualificationname	PhD	en_US
dcterms.dateAccepted	2018-10-02
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
rioxxterms.version	NA	en_US
rioxxterms.licenseref.uri	https://creativecommons.org/licenses/by/4.0/	en_US
rioxxterms.licenseref.startdate	2021-02-25
herts.preservation.rarelyaccessed	true
rioxxterms.funder.project	ba3b3abd-b137-4d1d-949a-23012ce7d7b9	en_US