Researcher bias : The use of machine learning in software defect prediction

Shepperd, Martin; Bowes, David; Hall, Tracy

dc.contributor.author	Shepperd, Martin
dc.contributor.author	Bowes, David
dc.contributor.author	Hall, Tracy
dc.date.accessioned	2014-12-10T14:47:31Z
dc.date.available	2014-12-10T14:47:31Z
dc.date.issued	2014-06-03
dc.identifier.citation	Shepperd , M , Bowes , D & Hall , T 2014 , ' Researcher bias : The use of machine learning in software defect prediction ' , IEEE Transactions in Software Engineering , vol. 40 , no. 6 , 6824804 , pp. 603-616 . https://doi.org/10.1109/TSE.2014.2322358
dc.identifier.issn	0098-5589
dc.identifier.uri	http://hdl.handle.net/2299/14911
dc.description.abstract	Background. The ability to predict defect-prone software components would be valuable. Consequently, there have been many empirical studies to evaluate the performance of different techniques endeavouring to accomplish this effectively. However no one technique dominates and so designing a reliable defect prediction model remains problematic. Objective. We seek to make sense of the many conflicting experimental results and understand which factors have the largest effect onpredictive performance. Method. We conduct a meta-analysis of all relevant, high quality primary studies of defect prediction to determine what factors influence predictive performance. This is based on 42 primary studies that satisfy our inclusion criteria that collectively report 600 sets of empirical prediction results. By reverse engineering a common response variable we build arandom effects ANOVA model to examine the relative contribution of four model building factors (classifier, data set, input metrics and researcher group) to model prediction performance. Results. Surprisingly we find that the choice of classifier has little impact upon performance (1.3 percent) and in contrast the major (31 percent) explanatory factor is the researcher group. It matters more who does the work than what is done. Conclusion. To overcome this high level of researcher bias, defect prediction researchers should (i) conduct blind analysis, (ii) improve reporting protocols and (iii) conduct more intergroup studies in order to alleviate expertise issues. Lastly, research is required to determine whether this bias is prevalent in other applications domains.	en
dc.format.extent	14
dc.format.extent	543659
dc.language.iso	eng
dc.relation.ispartof	IEEE Transactions in Software Engineering
dc.subject	meta-analysis
dc.subject	researcher bias
dc.subject	Software defect prediction
dc.subject	Software
dc.title	Researcher bias : The use of machine learning in software defect prediction	en
dc.contributor.institution	School of Computer Science
dc.contributor.institution	Science & Technology Research Institute
dc.contributor.institution	Centre for Computer Science and Informatics Research
dc.description.status	Peer reviewed
rioxxterms.versionofrecord	10.1109/TSE.2014.2322358
rioxxterms.type	Journal Article/Review
herts.preservation.rarelyaccessed	true

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Research publications

Show simple item record