Abstract

Single-cell transcriptomics data, when combined with in situ hybridization patterns of specific genes, can help in recovering the spatial information lost during cell isolation. Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium conducted a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC) to predict the masked locations of single cells from a set of 60, 40 and 20 genes out of 84 in situ gene patterns known in Drosophila embryo. We applied a genetic algorithm (GA) to predict the most important genes that carry positional and proximity information of the single-cell origins, in combination with the base distance mapping algorithm DistMap. Resulting gene selection was found to perform well and was ranked among top 10 in two of the three sub-challenges. However, the details of the method did not make it to the main challenge publication, due to an intricate aggregation ranking. In this work, we discuss the detailed implementation of GA and its post-challenge parameterization, with a view to identify potential areas where GA-based approaches of gene-set selection for topological association prediction may be improved, to be more effective. We believe this work provides additional insights into the feature-selection strategies and their relevance to single-cell similarity prediction and will form a strong addendum to the recently published work from the consortium.

Highlights

  • The advancement in next-generation sequencing (NGS) methods, coupled with cell sorting and culturing, have made it possible to study the precise transcriptomic profiles of individual cells

  • In order to identify the most important genes out of these 84 and to help model topological locations based on a gold-standard mapping, in this manuscript, we present the use of a genetic algorithm, followed by gene-ontology analysis of selected features

  • We do observe that the best-performing approach of Single Cell Transcriptomics Challenge (SCTC), called Thin Nguyen (TN) in this annotation, was crucial and when combined with Genetic algorithm (GA)-based features selected by our method outperforms all the scores, but most significantly the score 2 of Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge in the case of 60 feature selections

Read more

Summary

Introduction

The advancement in next-generation sequencing (NGS) methods, coupled with cell sorting and culturing, have made it possible to study the precise transcriptomic profiles of individual cells. Many of the above challenges in scRNAseq data analysis can be effectively addressed by improved computational strategies to cluster single-cell expression profiles in the absence of reliable values for all genes in most of the entities to be clustered The development of such computational strategies requires rigorous benchmarking on datasets and systems with well-characterized biological contexts. The missing values belong to different gene sets in each cell and for each measurement of expression profile, further complicating the problem of reconstructing them To alleviate this problem, and other undesirable attributes of the high-dimensional feature space of scRNAseq data, a priori feature-selection methods are implemented before clustering and downstream analysis of the dataset to identify informative genes to improve clustering results. We have in this paper used gold and silver standard terms equivalently but essentially refer to the DREAM challenge benchmarks, on which different methods have tried to perform the best

DREAM Dataset Description
Selection of Gene Sets
Data Preprocessing
Training Model
Genetic Algorithm
Fitness Function
Metric-1 Based on Root Mean Squared Deviation
Metric-2 Based on Spearman Correlation
Metric-3 Based on Jaccard Index
Metric-4 Based on Euclidean Distance
Final Fitness Function
Parameters
Post Competition Assessment of GA Hyperparameters
Creating Baseline Gene Sets to Evaluate Performance Gains in a Complex Method
Prediction Methods
GA Optimization of Fixed Sized Gene Sets
Feature Selection versus Location Assignment
Comparison with Other Gene Sets
Method
Parameter Evaluation Post DREAM Challenge
Discussion
Conclusions
Findings
A Next Generation Connectivity Map
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call