Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Bootstrapping estimates of stability for clusters, observations and model selection

Yu, Han, Chapman, Brian, Di Florio, Arianna ORCID: https://orcid.org/0000-0003-0338-2748, Eischen, Ellen, Gotz, David, Jacob, Mathews and Blair, Rachael Hageman 2019. Bootstrapping estimates of stability for clusters, observations and model selection. Computational Statistics 34 (1) , pp. 349-372. 10.1007/s00180-018-0830-y

[thumbnail of Bootstrapping estimates of stability for clusters.pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview

Abstract

Clustering is a challenging problem in unsupervised learning. In lieu of a gold standard, stability has become a valuable surrogate to performance and robustness. In this work, we propose a non-parametric bootstrapping approach to estimating the stability of a clustering method, which also captures stability of the individual clusters and observations. This flexible framework enables different types of comparisons between clusterings and can be used in connection with two possible bootstrap approaches for stability. The first approach, scheme 1, can be used to assess confidence (stability) around clustering from the original dataset based on bootstrap replications. A second approach, scheme 2, searches over the bootstrap clusterings for an optimally stable partitioning of the data. The two schemes accommodate different model assumptions that can be motivated by an investigator’s trust (or lack thereof) in the original data and additional computational considerations. We propose a hierarchical visualization extrapolated from the stability profiles that give insights into the separation of groups, and projected visualizations for the inspection of the stability of individual operations. Our approaches show good performance in simulation and on real data. These approaches can be implemented using the R package bootcluster that is available on the Comprehensive R Archive Network (CRAN).

Item Type: Article
Date Type: Publication
Status: Published
Schools: Medicine
MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG)
Publisher: Springer Verlag (Germany)
ISSN: 0943-4062
Date of First Compliant Deposit: 3 September 2018
Date of Acceptance: 18 August 2018
Last Modified: 12 Nov 2023 19:36
URI: https://orca.cardiff.ac.uk/id/eprint/114553

Citation Data

Cited 14 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics