Abstract

BackgroundA key challenge in the emerging field of single-cell RNA-Seq is to characterize phenotypic diversity between cells and visualize this information in an informative manner. A common technique when dealing with high-dimensional data is to project the data to 2 or 3 dimensions for visualization. However, there are a variety of methods to achieve this result and once projected, it can be difficult to ascribe biological significance to the observed features. Additionally, when analyzing single-cell data, the relationship between cells can be obscured by technical confounders such as variable gene capture rates.ResultsTo aid in the analysis and interpretation of single-cell RNA-Seq data, we have developed FastProject, a software tool which analyzes a gene expression matrix and produces a dynamic output report in which two-dimensional projections of the data can be explored. Annotated gene sets (referred to as gene ‘signatures’) are incorporated so that features in the projections can be understood in relation to the biological processes they might represent. FastProject provides a novel method of scoring each cell against a gene signature so as to minimize the effect of missed transcripts as well as a method to rank signature-projection pairings so that meaningful associations can be quickly identified. Additionally, FastProject is written with a modular architecture and designed to serve as a platform for incorporating and comparing new projection methods and gene selection algorithms.ConclusionsHere we present FastProject, a software package for two-dimensional visualization of single cell data, which utilizes a plethora of projection methods and provides a way to systematically investigate the biological relevance of these low dimensional representations by incorporating domain knowledge.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1176-5) contains supplementary material, which is available to authorized users.

Highlights

  • A key challenge in the emerging field of single-cell RNA-Seq is to characterize phenotypic diversity between cells and visualize this information in an informative manner

  • Software FastProject has been implemented as a command line Python package

  • As an input the software receives: (1) an expression matrix in a tab-delimited format. (2) a set of gene signatures, using the standard GMT format. Such sets of directed signatures are publicly available from various databases, such as Molecular signatures database (MsigDB) [12] and NetPath [24]

Read more

Summary

Introduction

A key challenge in the emerging field of single-cell RNA-Seq is to characterize phenotypic diversity between cells and visualize this information in an informative manner. An appealing way to make sense out of this immense data is to project it into a two dimensional scatter-plot, where each cell is represented by a single point While such representations provide an easy way to see obvious stratification of cells into a taxonomy of discrete types, they can provide more nuanced views of gradual transitions, reflecting for instance, developmental processes [3], physical locations [4], or the cell cycle [5]. Supplementing these two-dimensional views with additional, phenotypical information (e.g. the expression level of marker genes) can be used to provide the correct

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.