AutoQALLMs: Automating Web Application Testing Using Large Language Models (LLMs) and Selenium

Mallipeddi, Sindhupriya, Yaqoob, Muhammad, Khan, Javed Ali, Mehmood, Tahir, Mylonas, Alexios and Pitropakis, Nikolaos (2025) AutoQALLMs: Automating Web Application Testing Using Large Language Models (LLMs) and Selenium. Computers, 14 (11): 501. ISSN 2073-431X

Copy

Modern web applications change frequently in response to user and market needs, making their testing challenging. Manual testing and automation methods often struggle to keep up with these changes. We propose an automated testing framework, AutoQALLMs, that utilises various LLMs (Large Language Models), including GPT-4, Claude, and Grok, alongside Selenium WebDriver, BeautifulSoup, and regular expressions. This framework enables one-click testing, where users provide a URL as input and receive test results as output, thus eliminating the need for human intervention. It extracts HTML (Hypertext Markup Language) elements from the webpage and utilises the LLMs API to generate Selenium-based test scripts. Regular expressions enhance the clarity and maintainability of these scripts. The scripts are executed automatically, and the results, such as pass/fail status and error details, are displayed to the tester. This streamlined input–output process forms the core foundation of the AutoQALLMs framework. We evaluated the framework on 30 websites. The results show that the system drastically reduces the time needed to create test cases, achieves broad test coverage (96%) with Claude 4.5 LLM, which is competitive with manual scripts (98%), and allows for rapid regeneration of tests in response to changes in webpage structure. Software testing expert feedback confirmed that the proposed AutoQALLMs method for automated web application testing enables faster regression testing, reduces manual effort, and maintains reliable test execution. However, some limitations remain in handling complex page changes and validation. Although Claude 4.5 achieved slightly higher test coverage in the comparative evaluation of the proposed experiment, GPT-4 was selected as the default model for AutoQALLMs due to its cost-efficiency, reproducibility, and stable script generation across diverse websites. Future improvements may focus on increasing accuracy, adding self-healing techniques, and expanding to more complex testing scenarios.

Item Type	Article
Identification Number	10.3390/computers14110501
Additional information	© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Keywords	web application, llm, selenium, testing, gpt
Date Deposited	13 Jan 2026 10:41
Last Modified	18 Feb 2026 20:58

Explore Further

Computers

picture_as_pdf: computers-14-00501.pdf
subject: Published Version
: Available under Creative Commons: BY 4.0

View

Download

EndNote

BibTeX

Reference Manager

Refer

Atom

Dublin Core

RIOXX2 XML

OpenURL ContextObject in Span

MODS

METS

Data Cite XML

MPEG-21 DIDL

OpenURL ContextObject

HTML Citation

ASCII Citation

Export

Downloads