Instance Weighting for Partitioning and Graph Based Clustering

Moggridge, Paul (2025) Instance Weighting for Partitioning and Graph Based Clustering. Doctoral thesis, University of Hertfordshire.
Copy

Clustering algorithms often struggle to achieve robust clustering performance on datasets that contain outliers and imbalanced cluster sizes. While k-means and spectral clustering are popular choices, they demonstrate limitations. Kmeans, a partitioning-based method, while efficient and effective, is sensitive to outliers, leading to inaccurate cluster assignments. Spectral clustering, a graph-based approach, while it is able to model data of arbitrary shape, it performs poorly when clusters have imbalanced instance counts. Traditional instance selection methods address these issues by discarding instances, potentially losing valuable information. This thesis proposes a more nuanced approach by integrating instance weighting into these clustering algorithms. To achieve this, existing literature on instance weighted clustering is reviewed and two novel instance weighted clustering algorithms (LOFIWKM and IWSE) are proposed to demonstrate and evaluate under what conditions instance weighting is effective using both intrinsic and extrinsic clustering accuracy metrics. LOFIWKM builds on k-means and leverages the Local Outlier Factor measure of outlierness to estimate instance density and adjust centroid placement towards higher density areas. The approach was trialled on synthetic and realworld data to investigate how effective the approach is given different quantities and severities of outliers. The approach is also trialled on a sample of realworld avionics data. My experimental findings demonstrate that LOFIWKM effectively mitigates the influence of outliers, significantly improving clustering performance on datasets with different extents of outlier contamination. IWSE builds on a spectral clustering ensemble framework and incorporates density-based weights into the sub-sampling process. The approach is trialled on a variety of synthetic and benchmark datasets to establish when the approach is effective. Then using further synthetic experiments and the MNIST hand written digits dataset the applicability of the method is asserted. My experiments show that IWSE substantially outperforms traditional spectral clustering in terms of clustering performance on datasets with imbalanced clusters that exhibit clear cluster density variations. Overall, this work contributes novel instance weighting frameworks for partitioningbased and graph-based clustering, offering a robust alternative to instance selection. These approaches are generalizable to other clustering algorithms and can be combined with other techniques, opening new avenues for developing more accurate and robust clustering solutions for challenging real-world data.


picture_as_pdf
14096622 Moggridge Paul final submission October 2025.pdf
Available under Creative Commons: BY 4.0

View Download

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads
?