A Hybrid Spam Detection Method Based on Unstructured Datasets

Angelopoulou, Olga, Y, Shao, Marcello, Trovati, Q, Shi, E, Asimakopoulou and Nik, Bessis (2017) A Hybrid Spam Detection Method Based on Unstructured Datasets. Soft Computing, 21 (1). pp. 233-243. ISSN 1432-7643

Copy

The identification of non-genuine or malicious messages poses a variety of challenges due to the continuous changes in the techniques utilised by cyber-criminals. In this article, we propose a hybrid detection method based on a combination of image and text spam recognition techniques. In particular, the former is based on sparse representation-based classification, which focuses on the global and local image features, and a dictionary learning technique to achieve a spam and a ham sub-dictionary. On the other hand, the textual analysis is based on semantic properties of documents to assess the level of maliciousness. More specifically, we are able to distinguish between meta-spam and real spam. Experimental results show the accuracy and potential of our approach.

Item Type	Article
Identification Number	10.1007/s00500-015-1959-z
Additional information	This document is the accepted manuscript version of the following article: Shao, Y., Trovati, M., Shi, Q. et al. Soft Comput (2017) 21: 233. The final publication is available at Springer via http://dx.doi.org/10.1007/s00500-015-1959-z. © Springer-Verlag Berlin Heidelberg 2015.
Keywords	image spam, text spam, semantic networks, classication, subclass discriminant analysis, feature selection, sparse representation
Date Deposited	15 May 2025 13:27
Last Modified	24 Dec 2025 01:03