Learning the Regulatory Code of Gene Expression.

Jan Zrimec,Aleksej Zelezniak,Mariia Kokina,Filip Buric,Victor Garcia

doi:10.3389/fmolb.2021.673363

Abstract

Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.

Highlights

Genetic information is stored and encoded in genes that produce an organism’s phenotype by being expressed through multiple biochemical processes into a variety of functional molecules
Dependence on accurately labeled data: cannot achieve higher accuracy than that allowed by the noise inherent to the given experimental target labels (Li et al, 2019b; Barshai et al, 2020)
Multiple different methods exist for interpreting deep methods, many are a work in progress and no explicit solutions currently exist to benchmark these methods or to combine the findings into more complete and coherent interpretations (Azodi et al, 2020)

Summary

INTRODUCTION

Genetic information is stored and encoded in genes that produce an organism’s phenotype by being expressed through multiple biochemical processes into a variety of functional molecules. We detail the current understanding of the regulatory grammar carried within the specific coding and non-coding regulatory regions, and its involvement in defining transcript and protein abundance Based on these principles, we review advanced modeling approaches that use multiple different parts of the gene regulatory structure or whole nucleotide sequences, demonstrating how this increases their predictive power.

Method

59 Untranslated Region

39 Untranslated Region and Terminator

Findings

DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in molecular biosciences	Publication Date: Jun 10, 2021
Citations: 20	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Learning the Regulatory Code of Gene Expression.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in molecular biosciences

Lead the way for us

Similar Papers

Learning the regulatory grammar of DNA for gene expression engineering
...
F1000Research | VOL. 10
, et. al. ...
12 Feb 2021
F1000Research | VOL. 10

Learning the regulatory grammar of DNA for gene expression engineering
Jan Zrimec ... Aleksej Zelezniak
-
Jan Zrimec, et. al.Jan Zrimec ... Aleksej Zelezniak
21 Sep 2020
21 Sep 2020

Basic concepts and potential applications of genetics and genomics for cardiovascular and stroke clinicians: a scientific statement from the American Heart Association.
Kiran Musunuru ... Caroline S Fox
Circulation: Cardiovascular Genetics | VOL. 8
Kiran Musunuru, et. al.Kiran Musunuru ... Caroline S Fox
05 Jan 2015
Circulation: Cardiovascular Genetics | VOL. 8

Development and Validation of Simulation-Based Instructional Materials on Central Dogma of Molecular Biology for Senior High School
Junar S Cano
International Journal of Technology in Education and Science | VOL. 6
Junar S CanoJunar S Cano
25 May 2022
International Journal of Technology in Education and Science | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning the Regulatory Code of Gene Expression.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in molecular biosciences