An automated approach to identify sarcasm in low-resource language

Khan, Shumaila; Qasim, Iqbal; Khan, Wahab; Khan, Aurangzeb; Ali Khan, Javed; Qahmash, Ayman; Ghadi, Yazeed Yasin

dc.contributor.author	Khan, Shumaila
dc.contributor.author	Qasim, Iqbal
dc.contributor.author	Khan, Wahab
dc.contributor.author	Khan, Aurangzeb
dc.contributor.author	Ali Khan, Javed
dc.contributor.author	Qahmash, Ayman
dc.contributor.author	Ghadi, Yazeed Yasin
dc.contributor.editor	Hassani, Hossein
dc.date.accessioned	2024-12-06T12:00:02Z
dc.date.available	2024-12-06T12:00:02Z
dc.date.issued	2024-12
dc.identifier.citation	Khan , S , Qasim , I , Khan , W , Khan , A , Ali Khan , J , Qahmash , A , Ghadi , Y Y & Hassani , H (ed.) 2024 , ' An automated approach to identify sarcasm in low-resource language ' , PLoS ONE , vol. 19 , no. 12 , e0307186 , pp. 1-29 . https://doi.org/10.1371/journal.pone.0307186
dc.identifier.issn	1932-6203
dc.identifier.other	Jisc: 2477108
dc.identifier.other	publisher-id: pone-d-24-04057
dc.identifier.other	ORCID: /0000-0003-3306-1195/work/173286320
dc.identifier.uri	http://hdl.handle.net/2299/28522
dc.description	© 2024 The Author(s). This is an open access article distributed under the Creative Commons Attribution License, to view a copy of the license, see: https://creativecommons.org/licenses/by/4.0/
dc.description.abstract	Sarcasm detection has emerged due to its applicability in natural language processing (NLP) but lacks substantial exploration in low-resource languages like Urdu, Arabic, Pashto, and Roman-Urdu. While fewer studies identifying sarcasm have focused on low-resource languages, most of the work is in English. This research addresses the gap by exploring the efficacy of diverse machine learning (ML) algorithms in identifying sarcasm in Urdu. The scarcity of annotated datasets for low-resource language becomes a challenge. To overcome the challenge, we curated and released a comparatively large dataset named Urdu Sarcastic Tweets (UST) Dataset, comprising user-generated comments from X (former Twitter). Automatic sarcasm detection in text involves using computational methods to determine if a given statement is intended to be sarcastic. However, this task is challenging due to the influence of the user’s behavior and attitude and their expression of emotions. To address this challenge, we employ various baseline ML classifiers to evaluate their effectiveness in detecting sarcasm in low-resource languages. The primary models evaluated in this study are support vector machine (SVM), decision tree (DT), K-Nearest Neighbor Classifier (K-NN), linear regression (LR), random forest (RF), Naïve Bayes (NB), and XGBoost. Our study’s assessment involved validating the performance of these ML classifiers on two distinct datasets—the Tanz-Indicator and the UST dataset. The SVM classifier consistently outperformed other ML models with an accuracy of 0.85 across various experimental setups. This research underscores the importance of tailored sarcasm detection approaches to accommodate specific linguistic characteristics in low-resource languages, paving the way for future investigations. By providing open access to the UST dataset, we encourage its use as a benchmark for sarcasm detection research in similar linguistic contexts.	en
dc.format.extent	29
dc.format.extent	1600189
dc.language.iso	eng
dc.relation.ispartof	PLoS ONE
dc.subject	Algorithms
dc.subject	Decision Trees
dc.subject	Emotions
dc.subject	Humans
dc.subject	Language
dc.subject	Machine Learning
dc.subject	Natural Language Processing
dc.subject	Support Vector Machine
dc.subject	General
dc.title	An automated approach to identify sarcasm in low-resource language	en
dc.contributor.institution	School of Physics, Engineering & Computer Science
dc.contributor.institution	Cybersecurity and Computing Systems
dc.contributor.institution	Biocomputation Research Group
dc.contributor.institution	Department of Computer Science
dc.description.status	Peer reviewed
dc.identifier.url	http://www.scopus.com/inward/record.url?scp=85211569730&partnerID=8YFLogxK
rioxxterms.versionofrecord	10.1371/journal.pone.0307186
rioxxterms.type	Journal Article/Review
herts.preservation.rarelyaccessed	true

Files in this item

Name:: pone.0307186.pdf
Size:: 1.526Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Research publications

Show simple item record