Abstract

In the process of biological knowledge discovery, PCA is commonly used to complement the clustering analysis, but PCA typically gives the poor visualizations for most gene expression data sets. Here, we propose a PCCF measure, and use PCA-F to display clusters of PCCF, where PCCF and PCA-F are modeled from the modified cumulative probabilities of genes. From the analysis of simulated and experimental data sets, we demonstrate that PCCF is more appropriate and reliable for analyzing gene expression data compared to other commonly used distances or similarity measures, and PCA-F is a good visualization technique for identifying clusters of PCCF, where we aim at such data sets that the expression values of genes are collected at different time points.

Highlights

  • In the process of biological knowledge discovery, the clustering and visualizing analysis plays central roles [1,2,3]

  • K-means analysis depends on choosing an appropriate distance or similarity measure that takes into account the underlying biology and the nature of the data [6]

  • For t-SNE, it has been successful in displaying clusters of Euclidean distance [8], but it gives the poor visualizations for clusters of PCC usually

Read more

Summary

Introduction

In the process of biological knowledge discovery, the clustering and visualizing analysis plays central roles [1,2,3]. K-means analysis depends on choosing an appropriate distance or similarity measure that takes into account the underlying biology and the nature of the data [6]. For most gene expression data, PCA typically gives a poor visualization [8, 9]. Because of these limitations, nonlinear dimension reduction methods have been developed that attempt to preserve local structure in the data, such as t-SNE(t-statistic Stochastic Neighbor Embedding) [8, 10, 11]. For t-SNE, it has been successful in displaying clusters of Euclidean distance [8], but it gives the poor visualizations for clusters of PCC usually

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call