Mining software insights: uncovering the frequently occurring issues in low-rating software applications
View/ Open
Author
Khan, Nek Dil
Khan, Javed Ali
Li, Jianqiang
Ullah, Tahir
Zhao, Qing
Attention
2299/28242
Abstract
In today’s digital world, app stores have become an essential part of software distribution, providing customers with a wide range of applications and opportunities for software developers to showcase their work. This study elaborates on the importance of end-user feedback for software evolution. However, in the literature, more emphasis has been given to high-rating & popular software apps while ignoring comparatively low-rating apps. Therefore, the proposed approach focuses on end-user reviews collected from 64 low-rated apps representing 14 categories in the Amazon App Store. We critically analyze feedback from low-rating apps and developed a grounded theory to identify various concepts important for software evolution and improving its quality including user interface (UI) and user experience (UX), functionality and features, compatibility and device-specific, performance and stability, customer support and responsiveness and security and privacy issues. Then, using a grounded theory and content analysis approach, a novel research dataset is curated to evaluate the performance of baseline machine learning (ML), and state-of-the-art deep learning (DL) algorithms in automatically classifying end-user feedback into frequently occurring issues. Various natural language processing and feature engineering techniques are utilized for improving and optimizing the performance of ML and DL classifiers. Also, an experimental study comparing various ML and DL algorithms, including multinomial naive Bayes (MNB), logistic regression (LR), random forest (RF), multi-layer perception (MLP), k-nearest neighbors (KNN), AdaBoost, Voting, convolutional neural network (CNN), long short-term memory (LSTM), bidirectional long short term memory (BiLSTM), gated recurrent unit (GRU), bidirectional gated recurrent unit (BiGRU), and recurrent neural network (RNN) classifiers, achieved satisfactory results in classifying end-user feedback to commonly occurring issues. Whereas, MLP, RF, BiGRU, GRU, CNN, LSTM, and Classifiers achieved average accuracies of 94%, 94%, 92%, 91%, 90%, 89%, and 89%, respectively. We employed the SHAP approach to identify the critical features associated with each issue type to enhance the explainability of the classifiers. This research sheds light on areas needing improvement in low-rated apps and opens up new avenues for developers to improve software quality based on user feedback.
Publication date
2024-07-10Published in
PeerJ Computer SciencePublished version
https://doi.org/10.7717/peerj-cs.2115Other links
http://hdl.handle.net/2299/28242Metadata
Show full item recordRelated items
Showing items related by title, author, creator and subject.
-
The Relationship between Evolutionary Coupling and Defects in Large Industrial Software
Kirbas, Serkan; Caglayan, Bora; Hall, Tracy; Counsell, Steve; Bowes, David; Sen, Alper; Bener, Ayse (2017-04-05)Evolutionary coupling (EC) is defined as the implicit relationship between 2 or more software artifacts that are frequently changed together. Changing software is widely reported to be defect-prone. In this study, we ... -
Insights into software development approaches: mining Q &A repositories
Khan, Arif Ali; Khan, Javed Ali; Akbar, Muhammad Azeem; Zhou, Peng; Fahmideh, Mahdi (2024-01)Context: Software practitioners adopt approaches like DevOps, Scrum, and Waterfall for high-quality software development. However, limited research has been conducted on exploring software development approaches concerning ... -
Mutation-aware fault prediction
Bowes, David; Hall, Tracy; Harman, Mark; Jia, Yue; Sarro, Federica; Wu, Fan (ACM Press, 2016-07-18)We introduce mutation-aware fault prediction, which leverages additional guidance from metrics constructed in terms of mutants and the test cases that cover and detect them. We report the results of 12 sets of experiments, ...