Show simple item record

dc.contributor.authorChowdhury, Stiphen
dc.date.accessioned2021-12-06T13:06:56Z
dc.date.available2021-12-06T13:06:56Z
dc.date.issued2021-09-01
dc.identifier.urihttp://hdl.handle.net/2299/25224
dc.description.abstractK-Means is the most popular and widely used clustering algorithm. This algorithm cannot recover non-spherical shape clusters in data sets. DBSCAN is arguably the most popular algorithm to recover arbitrary shape clusters; this is why this density-based clustering algorithm is of great interest to tackle its weaknesses. One issue of concern is that DBSCAN requires two parameters, and it cannot recover widely variable density clusters. The problem lies at the heart of this thesis is that during the clustering process DBSCAN takes all the available features and treats all the features equally regardless of their degree of relevance in the data set, which can have negative impacts. This thesis addresses the above problems by laying the foundation of the feature weighted density-based clustering. Specifically, the thesis introduces a densitybased clustering algorithm using reverse nearest neighbour, DBSCANR that require less parameter than DBSCAN for recovering clusters. DBSCANR is based on the insight that in real-world data sets the densities of arbitrary shape clusters to be recovered within a data set are very different from each other. The thesis extends DBSCANR to what is referred to as weighted DBSCANR, WDBSCANR by exploiting feature weighting technique to give the different level of relevance to the features in a data set. The thesis extends W-DBSCANR further by using the Minkowski metric so that the weight can be interpreted as feature re-scaling factors named MW-DBSCANR. Experiments on both artificial and realworld data sets demonstrate the superiority of our method over DBSCAN type algorithms. These weighted algorithms considerably reduce the impact of irrelevant features while recovering arbitrary shape clusters of different level of densities in a high-dimensional data set. Within this context, this thesis incorporates a popular algorithm, feature selection using feature similarity, FSFS into bothW-DBSCANR andMW-DBSCANR, to address the problem of feature selection. This unsupervised feature selection algorithm makes use of feature clustering and feature similarity to reduce the number of features in a data set. With a similar aim, exploiting the concept of feature similarity, the thesis introduces a method, density-based feature selection using feature similarity, DBFSFS to take density-based cluster structure into consideration for reducing the number of features in a data set. This thesis then applies the developed method to real-world high-dimensional gene expression data sets. DBFSFS improves the clustering recovery by substantially reducing the number of features from high-dimensional low sample size data sets.en_US
dc.language.isoenen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.subjectclusteringen_US
dc.subjectdensity-based clusteringen_US
dc.subjectfeature weightingen_US
dc.subjectfeature selectionen_US
dc.subjectDBSCANen_US
dc.subjectreverse nearest neighbouren_US
dc.titleLearning Feature Weights for Density-Based Clusteringen_US
dc.typeinfo:eu-repo/semantics/doctoralThesisen_US
dc.identifier.doidoi:10.18745/th.25224*
dc.identifier.doi10.18745/th.25224
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhDen_US
dcterms.dateAccepted2021-09-01
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US
rioxxterms.versionNAen_US
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/en_US
rioxxterms.licenseref.startdate2021-12-06
herts.preservation.rarelyaccessedtrue
rioxxterms.funder.projectba3b3abd-b137-4d1d-949a-23012ce7d7b9en_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

info:eu-repo/semantics/openAccess
Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess