Abstract

Published single-cell datasets are rich resources for investigators who want to address questions not originally asked by the creators of the datasets. The single-cell datasets might be obtained by different protocols and diverse analysis strategies. The main challenge in utilizing such single-cell data is how we can make the various large-scale datasets to be comparable and reusable in a different context. To challenge this issue, we developed the single-cell centric database ‘SCPortalen’ (http://single-cell.clst.riken.jp/). The current version of the database covers human and mouse single-cell transcriptomics datasets that are publicly available from the INSDC sites. The original metadata was manually curated and single-cell samples were annotated with standard ontology terms. Following that, common quality assessment procedures were conducted to check the quality of the raw sequence. Furthermore, primary data processing of the raw data followed by advanced analyses and interpretation have been performed from scratch using our pipeline. In addition to the transcriptomics data, SCPortalen provides access to single-cell image files whenever available. The target users of SCPortalen are all researchers interested in specific cell types or population heterogeneity. Through the web interface of SCPortalen users are easily able to search, explore and download the single-cell datasets of their interests.

Highlights

  • Single-cell omics recently emerged as a powerful toolset to investigate heterogeneity of large populations of cells with regards to their functions and morphologies [1]

  • The metadata about the biological samples, used protocols and library construction methods are manually curated based on the main publication of each dataset

  • To add values to each dataset we developed an analysis pipeline composed of three parts: (i) applying common quality assessment procedures, which enables evaluation and assessment of each dataset in a standardized way; (ii) redoing primary data processing including alignment of raw sequence reads to a reference genome, classification of mapped reads into genomic sub-regions and gene-level expression quantification; and (iii) performing advanced analysis including clustering of cells (principle component analysis (PCA) [3] and t-Distributed Stochastic Neighbor Embedding (t-SNE) [4]), quantification of possible genomic contaminations, functional annotation of expressed genes, cell-cell gene expression correlation and cell-cycle phasing of individual cells

Read more

Summary

Introduction

Single-cell omics recently emerged as a powerful toolset to investigate heterogeneity of large populations of cells with regards to their functions and morphologies [1]. Singlecell technologies provide detailed information per biological sample including gene expression profiles and high resolution cell images. Improvements in sequencing, microscopy and microfluidic technologies led to a rapid increase in complex datasets with single-cell resolution. Lack of a database platform to achieve easy comparison and integration of single-cell data was a great barrier to efficiently investigate and re-use the published results. We developed SCPortalen, a single-cell centric database platform. The aim of this database is to provide a gateway to utilize the untapped potential of single-cell dataset. We first collected published single-cell transcriptomics data in human and mouse. The metadata (detailed information) about the biological samples, used protocols and library construction methods are manually curated based on the main publication of each dataset

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.