Abstract
BackgroundClustering on projected data is common in biomedical research analysis. Principal component analysis (PCA) is widely used for projection, focusing on data dispersion (variance), while clustering identifies data concentrations (neighborhood). These are conflicting aims. This study re-evaluates combinations of PCA and other projection methods with common clustering algorithms. MethodsSix projection methods (PCA, ICA, isomap, MDS, t-SNE, UMAP) were combined with five clustering algorithms (k-means, k-medoids, single link, Ward's method, average link). Projections and clusterings were evaluated using a numerical criterion for evaluating clustering performance and a visual criterion based on plotting the projected data on a Voronoi tessellation plane with class-wise coloring. Nine artificial and five real biomedical datasets were analyzed. ResultsNo combination consistently captured prior classifications in projections and clusters. Visual inspection proved essential. PCA was often but not always outperformed or equaled by neighborhood-based methods (UMAP, t-SNE) and manifold learning techniques (isomap). ConclusionsThe results dissaprove PCA as a standard projection method prior to clustering. Therefore, method selection should be data specific as a tailored approach to data projection and clustering in biomedical analysis. To aid this process, we propose a novel visualization technique that combines Voronoi tessellation with color coding.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.