Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique.

Ronald J Nowling,Kimani Njoya,Michelle M Riehle,John G Peters

doi:10.3389/fcimb.2023.1182567

Abstract

Various sequencing based approaches are used to identify and characterize the activities of cis-regulatory elements in a genome-wide fashion. Some of these techniques rely on indirect markers such as histone modifications (ChIP-seq with histone antibodies) or chromatin accessibility (ATAC-seq, DNase-seq, FAIRE-seq), while other techniques use direct measures such as episomal assays measuring the enhancer properties of DNA sequences (STARR-seq) and direct measurement of the binding of transcription factors (ChIP-seq with transcription factor-specific antibodies). The activities of cis-regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers. Here, machine learning models are employed to evaluate the accuracy with which cis-regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between cis-regulatory activity that is reflective of sequence content versus secondary processes. Models trained and evaluated on D. melanogaster sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for cis-regulatory element prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique.

Abstract

Talk to us

Similar Papers

More From: Frontiers in cellular and infection microbiology

Lead the way for us

Journal: Frontiers in cellular and infection microbiology	Publication Date: Aug 2, 2023
License type: CC BY 4.0

Similar Papers

Author response: PI3K signaling specifies proximal-distal fate by driving a developmental gene regulatory network in SOX9+ mouse lung progenitors
Sharlene Fernandes ... Matthew C Gillen
-
Sharlene Fernandes, et. al.Sharlene Fernandes ... Matthew C Gillen
14 Jun 2022
14 Jun 2022

In silico analysis of cis-acting regulatory elements in 5′ regulatory regions of sucrose transporter gene families in rice ( Oryza sativa Japonica) and Arabidopsis thaliana
Omodele Ibraheem ... Graeme Bradley
Computational Biology and Chemistry | VOL. 34
Omodele Ibraheem, et. al.Omodele Ibraheem ... Graeme Bradley
06 Oct 2010
Computational Biology and Chemistry | VOL. 34

Decision letter: Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes
Stijn De Langhe ... Edward E Morrisey
-
Stijn De Langhe, et. al.Stijn De Langhe ... Edward E Morrisey
11 Oct 2020
11 Oct 2020

Author response: Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes
Allen Wang ...
-
Allen Wang, et. al.Allen Wang ...
02 Nov 2020
02 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique.

Abstract

Talk to us

Similar Papers

More From: Frontiers in cellular and infection microbiology