Identification of Software Bugs by Analyzing Natural Language-Based Requirements Using Optimized Deep Learning Features

Haq, Qazi Mazhar ul; Arif, Fahim; Aurangzeb, Khursheed; Ain, Noor ul; Khan, Javed Ali; Rubab, Saddaf; Anwar, Muhammad Shahid

View/Open

TSP_CMC_47172.pdf (PDF, 707Kb)

Author

Haq, Qazi Mazhar ul

Arif, Fahim

Aurangzeb, Khursheed

Ain, Noor ul

Khan, Javed Ali

Rubab, Saddaf

Anwar, Muhammad Shahid

Abstract

Software project outcomes heavily depend on natural language requirements, often causing diverse interpretations and issues like ambiguities and incomplete or faulty requirements. Researchers are exploring machine learning to predict software bugs, but a more precise and general approach is needed. Accurate bug prediction is crucial for software evolution and user training, prompting an investigation into deep and ensemble learning methods. However, these studies are not generalized and efficient when extended to other datasets. Therefore, this paper proposed a hybrid approach combining multiple techniques to explore their effectiveness on bug identification problems. The methods involved feature selection, which is used to reduce the dimensionality and redundancy of features and select only the relevant ones; transfer learning is used to train and test the model on different datasets to analyze how much of the learning is passed to other datasets, and ensemble method is utilized to explore the increase in performance upon combining multiple classifiers in a model. Four National Aeronautics and Space Administration (NASA) and four Promise datasets are used in the study, showing an increase in the model’s performance by providing better Area Under the Receiver Operating Characteristic Curve (AUC-ROC) values when different classifiers were combined. It reveals that using an amalgam of techniques such as those used in this study, feature selection, transfer learning, and ensemble methods prove helpful in optimizing the software bug prediction models and providing high-performing, useful end mode.