dc.contributor.author | Chowdhury, Stiphen | |
dc.date.accessioned | 2021-12-06T13:06:56Z | |
dc.date.available | 2021-12-06T13:06:56Z | |
dc.date.issued | 2021-09-01 | |
dc.identifier.uri | http://hdl.handle.net/2299/25224 | |
dc.description.abstract | K-Means is the most popular and widely used clustering algorithm. This algorithm
cannot recover non-spherical shape clusters in data sets. DBSCAN is arguably
the most popular algorithm to recover arbitrary shape clusters; this is why
this density-based clustering algorithm is of great interest to tackle its weaknesses.
One issue of concern is that DBSCAN requires two parameters, and it cannot recover
widely variable density clusters. The problem lies at the heart of this thesis
is that during the clustering process DBSCAN takes all the available features and
treats all the features equally regardless of their degree of relevance in the data set,
which can have negative impacts.
This thesis addresses the above problems by laying the foundation of the feature
weighted density-based clustering. Specifically, the thesis introduces a densitybased
clustering algorithm using reverse nearest neighbour, DBSCANR that require
less parameter than DBSCAN for recovering clusters. DBSCANR is based
on the insight that in real-world data sets the densities of arbitrary shape clusters
to be recovered within a data set are very different from each other.
The thesis extends DBSCANR to what is referred to as weighted DBSCANR, WDBSCANR
by exploiting feature weighting technique to give the different level of
relevance to the features in a data set. The thesis extends W-DBSCANR further
by using the Minkowski metric so that the weight can be interpreted as feature
re-scaling factors named MW-DBSCANR. Experiments on both artificial and realworld
data sets demonstrate the superiority of our method over DBSCAN type
algorithms. These weighted algorithms considerably reduce the impact of irrelevant
features while recovering arbitrary shape clusters of different level of densities
in a high-dimensional data set.
Within this context, this thesis incorporates a popular algorithm, feature selection
using feature similarity, FSFS into bothW-DBSCANR andMW-DBSCANR, to
address the problem of feature selection. This unsupervised feature selection algorithm
makes use of feature clustering and feature similarity to reduce the number
of features in a data set. With a similar aim, exploiting the concept of feature
similarity, the thesis introduces a method, density-based feature selection using
feature similarity, DBFSFS to take density-based cluster structure into consideration
for reducing the number of features in a data set. This thesis then applies
the developed method to real-world high-dimensional gene expression data sets.
DBFSFS improves the clustering recovery by substantially reducing the number of
features from high-dimensional low sample size data sets. | en_US |
dc.language.iso | en | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.rights | Attribution 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/us/ | * |
dc.subject | clustering | en_US |
dc.subject | density-based clustering | en_US |
dc.subject | feature weighting | en_US |
dc.subject | feature selection | en_US |
dc.subject | DBSCAN | en_US |
dc.subject | reverse nearest neighbour | en_US |
dc.title | Learning Feature Weights for Density-Based Clustering | en_US |
dc.type | info:eu-repo/semantics/doctoralThesis | en_US |
dc.identifier.doi | doi:10.18745/th.25224 | * |
dc.identifier.doi | 10.18745/th.25224 | |
dc.type.qualificationlevel | Doctoral | en_US |
dc.type.qualificationname | PhD | en_US |
dcterms.dateAccepted | 2021-09-01 | |
rioxxterms.funder | Default funder | en_US |
rioxxterms.identifier.project | Default project | en_US |
rioxxterms.version | NA | en_US |
rioxxterms.licenseref.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
rioxxterms.licenseref.startdate | 2021-12-06 | |
herts.preservation.rarelyaccessed | true | |
rioxxterms.funder.project | ba3b3abd-b137-4d1d-949a-23012ce7d7b9 | en_US |