Transcription factor binding site detection using convolutional neural networks with a functional group-based data representation

Gergely Pap,László Tóth,Krisztián Ádám,Zoltán Hegedűs,Györgypál Zoltán

doi:10.1088/1742-6596/1824/1/012001

Abstract

Transcription factors (TFs) play an essential role in molecular biology by regulating gene expression. The binding sites of TFs can vary by a large amount and the numerous possible binding locations make their detection a challenging issue. Recently, several machine learning approaches using nucleotide sequence data were applied to classify DNA sequences regarding Transcription Factor Binding Sites (TFBS). We propose a novel training strategy without the traditional 1D nucleotide-based DNA sequence representation by instead using a 2D topological matrix of sub-nucleotide chemical functional groups substantially defining the protein binding ability of DNA fragments. We train convolutional neural networks using this novel Functional Group DNA Representation (FGDR) to solve a TFBS classification task. We compare our results with the efficiency of previous nucleotide-based training approaches and show that learning from an FGDR data sequence has several benefits regarding TFBS classification. Moreover, we reason that learning deep neural networks from the FGDR representation produces competitive results while only introducing a pre-processing conversion step. Finally, we show that employing an ensemble of models from the nucleotide and FGDR representations for network training results in higher classification performance than any of the single input approaches.

Highlights

Transcription factors (TFs) are gene expression regulating proteins which play an important role in almost all cell physiological processes and in the related molecular mechanisms
In the last few years previous bioinformatics methods based on position weight matrices and other interpretable statistical methods for identification of DNA recognition motifs were surpassed by machine learning approaches trained on nucleotide sequence data
Since Functional Group DNA Representation (FGDR) is a larger input space compared to nucleotide data, we found that constructing an adequately complex or deep CNN is necessary for accurate model performance

Summary

Introduction

Transcription factors (TFs) are gene expression regulating proteins which play an important role in almost all cell physiological processes and in the related molecular mechanisms. Transcription factors detect and bind DNA double helix strands at TF specific positions called DNA recognition motifs. Motifs are represented by the sequential combination of A-C-G-T nucleotides and are typically 4-18 base-pair long. Finding and classifying these motifs is a long-standing question of molecular and computational biology. In the last few years previous bioinformatics methods based on position weight matrices and other interpretable statistical methods for identification of DNA recognition motifs were surpassed by machine learning approaches trained on nucleotide sequence data. Learning CNNs on this novel representation for TFBS classification surpass the performance of other, nucleotide sequence-based methods

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Mar 1, 2021
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Transcription factor binding site detection using convolutional neural networks with a functional group-based data representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Transcription Factor Information System (TFIS): A Tool for Detection of Transcription Factor Binding Sites.
Priyanka Narad ... Pranav Patni
Interdisciplinary Sciences: Computational Life Sciences | VOL. 9
Priyanka Narad, et. al.Priyanka Narad ... Pranav Patni
06 Apr 2016
Interdisciplinary Sciences: Computational Life Sciences | VOL. 9

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
George H Perry
-
George H PerryGeorge H Perry
07 Sep 2022
07 Sep 2022

Author response: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Hjörleifur Einarsson ... Marco Salvatore
-
Hjörleifur Einarsson, et. al.Hjörleifur Einarsson ... Marco Salvatore
03 Nov 2022
03 Nov 2022

On the Power of Profiles for Transcription Factor Binding Site Detection
Sven Rahmann ... Martin Vingron
Statistical Applications in Genetics and Molecular Biology | VOL. 2
Sven Rahmann, et. al.Sven Rahmann ... Martin Vingron
29 Jan 2003
Statistical Applications in Genetics and Molecular Biology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transcription factor binding site detection using convolutional neural networks with a functional group-based data representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series