dc.contributor.author | Raeesi, H. | |
dc.contributor.author | Khosravi, A. | |
dc.contributor.author | Sarhadi, P. | |
dc.date.accessioned | 2025-01-30T04:15:01Z | |
dc.date.available | 2025-01-30T04:15:01Z | |
dc.date.issued | 2025-01-30 | |
dc.identifier.citation | Raeesi , H , Khosravi , A & Sarhadi , P 2025 , ' Safe Reinforcement Learning by Shielding based Reachable Zonotopes for Autonomous Vehicles ' , International Journal of Engineering , vol. 38 , no. 1 , pp. 21-34 . https://doi.org/10.5829/ije.2025.38.01a.03 | |
dc.identifier.issn | 1025-2495 | |
dc.identifier.other | RIS: urn:E91C2D7FEB0028514E6C602904DC2256 | |
dc.identifier.other | RIS: 196484 | |
dc.identifier.other | ORCID: /0000-0002-6004-676X/work/177105737 | |
dc.identifier.uri | http://hdl.handle.net/2299/28758 | |
dc.description | © 2025 The Author(s). This is an open access article distributed under the Creative Commons Attribution License, to view a copy of the license, see: https://creativecommons.org/licenses/by/4.0/ | |
dc.description.abstract | The field of autonomous vehicles (AV) has been the subject of extensive research in recent years. It is possible that AVs could contribute greatly to the quality of daily lives if they were implemented. A safe driver model that controls autonomous vehicles is required before this can be accomplished. Reinforcement Learning (RL) is one of the methods suitable for creating these models. In these circumstances, RL agents typically perform random actions during training, which poses a safety risk when driving an AV. To address this issue, shielding has been proposed. By predicting the future state after an action has been taken and determining whether the future state is safe, this shield determines whether the action is safe. For this purpose, reachable zonotopes must be provided, so that at each planning stage, the reachable set of vehicles does not intersect with any obstacles. To this end, we propose a Safe Reinforcement Learning by Shielding-based Reachable Zonotopes (SRLSRZ) approach. It is built around Twin Delayed DDPG (TD3) and compared with it. During training and execution, shielded systems have zero collision. their efficiency is similar to or even better than TD3. A shield-based learning approach is demonstrated to be effective in enabling the agent to learn not to propose unsafe actions. Simulated results indicate that a car vehicle with an unsafe set adjacent to the area that provides the greatest reward performs better when SRLSRZ is used as compared with other methods that are currently considered to be state-of-the-art for achieving safe RL.[Figure | en |
dc.format.extent | 14 | |
dc.format.extent | 1190314 | |
dc.language.iso | eng | |
dc.relation.ispartof | International Journal of Engineering | |
dc.subject | Safe Reinforcement learning | |
dc.subject | Shielding | |
dc.subject | reachable set | |
dc.subject | Autonomous Vehicles | |
dc.title | Safe Reinforcement Learning by Shielding based Reachable Zonotopes for Autonomous Vehicles | en |
dc.contributor.institution | School of Physics, Engineering & Computer Science | |
dc.contributor.institution | Centre for Engineering Research | |
dc.contributor.institution | Department of Engineering and Technology | |
dc.contributor.institution | Communications and Intelligent Systems | |
dc.contributor.institution | Networks and Security Research Centre | |
dc.description.status | Peer reviewed | |
dc.identifier.url | http://www.scopus.com/inward/record.url?scp=85203859361&partnerID=8YFLogxK | |
rioxxterms.versionofrecord | 10.5829/ije.2025.38.01a.03 | |
rioxxterms.type | Journal Article/Review | |
herts.preservation.rarelyaccessed | true | |