RegEl corpus: identifying DNA regulatory elements in the scientific literature

Samuele Garda,Stefanie Hochmuth,Dominik Seelow,Freyda Lenihan-Geels,Markus Schülke,Sebastian Proft,Ulf Leser

doi:10.1093/database/baac043

Abstract

High-throughput technologies led to the generation of a wealth of data on regulatory DNA elements in the human genome. However, results from disease-driven studies are primarily shared in textual form as scientific articles. Information extraction (IE) algorithms allow this information to be (semi-)automatically accessed. Their development, however, is dependent on the availability of annotated corpora. Therefore, we introduce RegEl (Regulatory Elements), the first freely available corpus annotated with regulatory DNA elements comprising 305 PubMed abstracts for a total of 2690 sentences. We focus on enhancers, promoters and transcription factor binding sites. Three annotators worked in two stages, achieving an overall 0.73 F1 inter-annotator agreement and 0.46 for regulatory elements. Depending on the entity type, IE baselines reach F1-scores of 0.48–0.91 for entity detection and 0.71–0.88 for entity normalization. Next, we apply our entity detection models to the entire PubMed collection and extract co-occurrences of genes or diseases with regulatory elements. This generates large collections of regulatory elements associated with 137 870 unique genes and 7420 diseases, which we make openly available. Database URL: https://zenodo.org/record/6418451#.YqcLHvexVqg

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Jun 27, 2022
Citations: 1	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

RegEl corpus: identifying DNA regulatory elements in the scientific literature

Abstract

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

Cell Type-Specific Chromatin Signatures Underline Regulatory DNA Elements in Human Induced Pluripotent Stem Cells and Somatic Cells
Ming-Tao Zhao ... Fereshteh Jahanbani
Circulation Research | VOL. 121
Ming-Tao Zhao, et. al.Ming-Tao Zhao ... Fereshteh Jahanbani
13 Oct 2017
Circulation Research | VOL. 121

TTS Mapping: integrative WEB tool for analysis of triplex formation target DNA Sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome
Piroon Jenjaroenpun ... Vladimir A Kuznetsov
BMC Genomics | VOL. 10
Piroon Jenjaroenpun, et. al.Piroon Jenjaroenpun ... Vladimir A Kuznetsov
01 Jan 2009
BMC Genomics | VOL. 10

Efficient inversions and duplications of mammalian regulatory DNA elements and gene clusters by CRISPR/Cas9.
Jinhuan Li ... Ya Guo
Journal of Molecular Cell Biology | VOL. 7
Jinhuan Li, et. al.Jinhuan Li ... Ya Guo
10 Mar 2015
Journal of Molecular Cell Biology | VOL. 7

EMERGE: a flexible modelling framework to predict genomic regulatory elements from genomic signatures.
Karel Van Duijvenboden ... Jan M Ruijter
Nucleic acids research | VOL. 44
Karel Van Duijvenboden, et. al.Karel Van Duijvenboden ... Jan M Ruijter
03 Nov 2015
Nucleic acids research | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RegEl corpus: identifying DNA regulatory elements in the scientific literature

Abstract

Talk to us

Similar Papers

More From: Database