Abstract

Single-cell RNA-sequencing (scRNA-seq) data provide opportunities to reveal new insights into many biological problems such as elucidating cell types. An effective approach to elucidate cell types in complex tissues is to partition the cells into several separated subgroups via clustering techniques, where the cells in a specific cluster belong to the same cell type based on gene expression patterns. In this work, we present a novel multiple kernel clustering framework for scRNA-seq data clustering via locality preserving kernel alignment. Specifically, we first generate a series of similarity kernel matrices by using different kernel functions. Then we transfer the clustering task to a multiple kernel k-means problem with the kernels aligned in a local manner, i.e., the similarity of a sample to its k-nearest neighbours are aligned with the ideal similarity matrix. In our method, the clustering process focuses on closer sample pairs that shall stay together, and avoids involving unreliable similarity evaluation for farther sample pairs. In addition, we construct a local Laplacian matrix for each sample to constrain that closer samples should be allocated similar labels. In such a manner, the local structure of the data can be well preserved and utilized to produce better alignment for clustering. An alternate updating algorithm with theoretical analysis is developed to solve the proposed problem. We evaluate the performance of the proposed method on various real scRNA-seq data, and the results show that our method can obtain superior results when compared with other state-of-the-art approaches.

Highlights

  • Recent literature indicate that single-cell measurements plays an important role in understanding cellular heterogeneity [1]–[5] and cell differentiation [6], [7]

  • A straightforward approach to elucidate cell types in complex tissues is to partition the cells into some separated subgroups via clustering techniques [10]–[13], which can be regarded as an unsupervised classification problem [14]–[16]

  • Many previous clustering techniques can be used for this task, such as principal component analysis (PCA) [17], spectral clustering [18], and k-means [19]

Read more

Summary

Introduction

Recent literature indicate that single-cell measurements plays an important role in understanding cellular heterogeneity [1]–[5] and cell differentiation [6], [7]. These datasets provide opportunities to reveal new insights into many biological problems, e.g., elucidating cell types, on the other hand, there are computational challenges due to the amount of data. A straightforward approach to elucidate cell types in complex tissues is to partition the cells into some separated subgroups via clustering techniques [10]–[13], which can be regarded as an unsupervised classification problem [14]–[16]. Different to bulk RNAseq or gene expression microarrays, there are high level of noise and many missing values in scRNA-seq data due to technical and sampling issues [20]–[22]. The high variability exists in gene expression levels even between cells of the same type, and this could degenerate the performance of those existing clustering approaches [23]–[27]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call