Show simple item record

dc.contributor.authorPanday, Deepak
dc.date.accessioned2021-12-07T12:00:53Z
dc.date.available2021-12-07T12:00:53Z
dc.date.issued2021-10-24
dc.identifier.urihttp://hdl.handle.net/2299/25233
dc.description.abstractThe weighted variant of k-Means (Wk-Means), which assigns values to features based on their relevance, is a well-known approach to address the shortcoming of k-Means with data containing noisy and irrelevant features. This research aims first to explore how feature weighting can be used for feature selection, second to investigate the performance of Minkowski weighted k- Means (MWk-Means), and its intelligent variant, on datasets defined in different p-norms, and third to address the problem of missing values with a weighted variant of k-Means. A partial distance approach is used to address the problem of missing values for weighted variant of k- Means. Anomalous clustering has been successfully used to detect natural clusters and initialize centroids in k-means type algorithms. Similarly, extensive work has been carried out on using feature weights to rescale features under Minkowski Lp metrics for p ≥ 1 . In this thesis, aspects from both of these approaches enable feature weights to be detected based on natural clusters present in the training data, but the clusters are not limited to spherical shape. Two methods, mean-FSFW and max-FSFW, are developed as further extensions of intelligent Minkowski Weighted k-Means(iMWk-Means), where feature weights are used as indices for feature selection with no requirement for user-specified parameters. The proposed feature selection methods are able to significantly reduce the number of noisy features. These methods are further extended to mean-FSFWextPD and max-FSFWextPD to address missing values and are found to be better alternatives than existing imputation methods. The effect of feature weighting on clustering of dataset defined in varying p-norms is further explored in the thesis. An algorithm that translates a dataset into different p-norms has been proposed. The capability of MWk-Means to read true shapes of clusters defined in different p- norms is explored. To address the problem of missing feature values in weighted variant of k-Means, different missing-value imputation methods are tested. The MWk-Means and its intelligent variant are further extended to incorporate the partial distance approach, specifically to address the problem of missing values. All these methods are tested in both synthetic and real-world datasets against three models of noise - noisy feature added, feature blurring and cluster-wise feature blurring - where applicable. These noises are generated from Gaussian and uniform distribution with three different strength of noise, i.e., no noise, half noise and full noise Overall, results demonstrate that feature weighting can improve feature selection. The partial- distance approach, with feature weights, is effective at ignoring missing values, and cluster retrieval in various p-norm spaces is effective.en_US
dc.language.isoenen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.subjectLp spaceen_US
dc.subjectfeature weighingen_US
dc.subjectkMeansen_US
dc.subjectintelligent kMeansen_US
dc.subjectMinkowski weighted KMeansen_US
dc.subjectpartial distanceen_US
dc.subjectmissing patternen_US
dc.subjectmissing mechanismen_US
dc.subjectGaussian mixed-modelen_US
dc.titleUsing Feature Weighting as a Tool for Clustering Applicationsen_US
dc.typeinfo:eu-repo/semantics/doctoralThesisen_US
dc.identifier.doidoi:10.18745/th.25233en_US
dc.identifier.doi10.18745/th.25233
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhDen_US
dcterms.dateAccepted2021-10-24
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US
rioxxterms.versionNAen_US
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/en_US
rioxxterms.licenseref.startdate2021-12-07
herts.preservation.rarelyaccessedtrue
rioxxterms.funder.projectba3b3abd-b137-4d1d-949a-23012ce7d7b9en_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

info:eu-repo/semantics/openAccess
Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess