Feature Selection for Topological Proximity Prediction of Single-Cell Transcriptomic Profiles in Drosophila Embryo Using Genetic Algorithm

Shruti Gupta,Shandar Ahmad,Ajay Kumar Verma

doi:10.3390/genes12010028

Shruti Gupta, Shandar Ahmad + Show 1 more

Open Access

https://doi.org/10.3390/genes12010028

Copy DOI

Journal: Genes	Publication Date: Dec 28, 2020
Citations: 4	License type: CC BY 4.0

Affiliation: Jawaharlal Nehru University

Abstract

Single-cell transcriptomics data, when combined with in situ hybridization patterns of specific genes, can help in recovering the spatial information lost during cell isolation. Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium conducted a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC) to predict the masked locations of single cells from a set of 60, 40 and 20 genes out of 84 in situ gene patterns known in Drosophila embryo. We applied a genetic algorithm (GA) to predict the most important genes that carry positional and proximity information of the single-cell origins, in combination with the base distance mapping algorithm DistMap. Resulting gene selection was found to perform well and was ranked among top 10 in two of the three sub-challenges. However, the details of the method did not make it to the main challenge publication, due to an intricate aggregation ranking. In this work, we discuss the detailed implementation of GA and its post-challenge parameterization, with a view to identify potential areas where GA-based approaches of gene-set selection for topological association prediction may be improved, to be more effective. We believe this work provides additional insights into the feature-selection strategies and their relevance to single-cell similarity prediction and will form a strong addendum to the recently published work from the consortium.

Highlights

The advancement in next-generation sequencing (NGS) methods, coupled with cell sorting and culturing, have made it possible to study the precise transcriptomic profiles of individual cells
In order to identify the most important genes out of these 84 and to help model topological locations based on a gold-standard mapping, in this manuscript, we present the use of a genetic algorithm, followed by gene-ontology analysis of selected features
We do observe that the best-performing approach of Single Cell Transcriptomics Challenge (SCTC), called Thin Nguyen (TN) in this annotation, was crucial and when combined with Genetic algorithm (GA)-based features selected by our method outperforms all the scores, but most significantly the score 2 of Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge in the case of 60 feature selections

Summary

Introduction

The advancement in next-generation sequencing (NGS) methods, coupled with cell sorting and culturing, have made it possible to study the precise transcriptomic profiles of individual cells. Many of the above challenges in scRNAseq data analysis can be effectively addressed by improved computational strategies to cluster single-cell expression profiles in the absence of reliable values for all genes in most of the entities to be clustered The development of such computational strategies requires rigorous benchmarking on datasets and systems with well-characterized biological contexts. The missing values belong to different gene sets in each cell and for each measurement of expression profile, further complicating the problem of reconstructing them To alleviate this problem, and other undesirable attributes of the high-dimensional feature space of scRNAseq data, a priori feature-selection methods are implemented before clustering and downstream analysis of the dataset to identify informative genes to improve clustering results. We have in this paper used gold and silver standard terms equivalently but essentially refer to the DREAM challenge benchmarks, on which different methods have tried to perform the best

DREAM Dataset Description

Selection of Gene Sets

Data Preprocessing

Training Model

Genetic Algorithm

Fitness Function

Metric-1 Based on Root Mean Squared Deviation

Metric-2 Based on Spearman Correlation

Metric-3 Based on Jaccard Index

Metric-4 Based on Euclidean Distance

Final Fitness Function

Parameters

Post Competition Assessment of GA Hyperparameters

Creating Baseline Gene Sets to Evaluate Performance Gains in a Complex Method

Prediction Methods

GA Optimization of Fixed Sized Gene Sets

Feature Selection versus Location Assignment

Comparison with Other Gene Sets

Method

Parameter Evaluation Post DREAM Challenge

Discussion

Conclusions

Findings

A Next Generation Connectivity Map

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feature Selection for Topological Proximity Prediction of Single-Cell Transcriptomic Profiles in Drosophila Embryo Using Genetic Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genes

Lead the way for us

Similar Papers

Prediction of cell position using single-cell transcriptomic data: an iterative procedure
Andrés M Alonso ... Alejandra Carrea
F1000Research | VOL. 8
Andrés M Alonso, et. al.Andrés M Alonso ... Alejandra Carrea
18 Oct 2019
F1000Research | VOL. 8

Prediction of cell position using single-cell transcriptomic data: an iterative procedure.
Andrés M Alonso ... Luis Diambra
F1000Research | VOL. 8
Andrés M Alonso, et. al.Andrés M Alonso ... Luis Diambra
09 Apr 2020
F1000Research | VOL. 8

Prediction of cell position using single-cell transcriptomic data: an iterative procedure
Pablo Meyer ... Luis Diambra
F1000Research | VOL. 8
Pablo Meyer, et. al.Pablo Meyer ... Luis Diambra
31 Mar 2020
F1000Research | VOL. 8

Deep learning exploration of single-cell and spatially resolved cancer transcriptomics to unravel tumour heterogeneity
Raid Halawani ... Yi-Ping Phoebe Chen
Computers in Biology and Medicine | VOL. 164
Raid Halawani, et. al.Raid Halawani ... Yi-Ping Phoebe Chen
18 Jul 2023
Computers in Biology and Medicine | VOL. 164

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Selection for Topological Proximity Prediction of Single-Cell Transcriptomic Profiles in Drosophila Embryo Using Genetic Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genes