Partitional Clustering of Malware Using K-Means
Author
Cordeiro De Amorim, Renato
Komisarczuk, Peter
Attention
2299/17080
Abstract
This paper describes a novel method aiming to cluster datasets containing malware behavioural data. Our method transform the data into an standardised data matrix that can be used in any clustering algorithm, finds the number of clusters in the data set and includes an optional visualization step for high-dimensional data using principal component analysis. Our clustering method deals well with categorical data, and it is able to cluster the behavioural data of 17,000 websites, acquired with Capture-HPC, in less than 2 min