Abstract

Recent emergence of a new coronavirus, SARS-CoV2, has caused the disease COVID-19 and has been declared a worldwide pandemic. Identification of relevant modules such as target cells is a significant step for characterizing diseases and consequently leads to better diagnosis, treatment and prognosis. High-throughput single-cell RNA-Seq (scRNA-seq) technologies have advanced in recent years, enabling researchers to investigate cells individually and understand their biological mechanisms. Computational techniques such as data clustering, which are categorized via unsupervised learning methods, are the more suitable for the pre-processing step in scRNA-seq data analysis. They can be used to identify a group of genes that belong to a specific cell type based on similar gene expression patterns. However, due to the sparsity and high-dimensional nature of this type of data, classical clustering methods are not efficient. Therefore, the use of nonlinear dimensionality reduction techniques to improve clustering results is crucial. In this work, we aim to find representative clusters of SARS-CoV-2 target cell lung by combining dimensionality reduction and clustering techniques. We first perform upstream analysis on data, including normalization and filtering using quality control metrics. We then assess the impact of different dimensionality reduction techniques on the clustering results. Our results show that modified Locally Linear Embedding combined with Independent Component Analysis have a very positive impact on clustering large-scale COVID19 scRNA-seq data. To validate our findings, we identified target cell types involved in immune system functionality and a list of overlapping marker genes among COVID-19, Influenza A and HSV-1 infection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call