Abstract

While long non-coding RNA (lncRNA) research in the past has primarily focused on the discovery of novel genes, today it has shifted towards functional annotation of this large class of genes. With thousands of lncRNA studies published every year, the current challenge lies in keeping track of which lncRNAs are functionally described. This is further complicated by the fact that lncRNA nomenclature is not straightforward and lncRNA annotation is scattered across different resources with their own quality metrics and definition of a lncRNA. To overcome this issue, large scale curation and annotation is needed. Here, we present the fifth release of the human lncRNA database LNCipedia (https://lncipedia.org). The most notable improvements include manual literature curation of 2482 lncRNA articles and the use of official gene symbols when available. In addition, an improved filtering pipeline results in a higher quality reference lncRNA gene set.

Highlights

  • The human genome is pervasively transcribed, producing vast amounts of RNA transcripts, of which the majority does not encode protein [1]

  • Notable examples are LncRNAWiki, a wiki-based resource that combines computational and manual curation [7,8] and NONCODE, a long non-coding RNA (lncRNA) annotation database covering 17 species of which human and mouse have the highest number of annotations [9]

  • Several research groups have turned to manual literature curation to annotate lncRNA with functional evidence or aberrant expression in disease contexts. Notable examples of such datasets are Lnc2Cancer [13], LncRNADisease [14], the recently published pan-cancer lncRNA co-expression atlas LncMAP [15] and the Mammal ncRNA Disease Repository (MNDR) that stores 3213 mammalian lncRNAs associated with diseases [16]

Read more

Summary

INTRODUCTION

The human genome is pervasively transcribed, producing vast amounts of RNA transcripts, of which the majority does not encode protein [1]. Several research groups have turned to manual literature curation to annotate lncRNA with functional evidence or aberrant expression in disease contexts Notable examples of such datasets are Lnc2Cancer [13], LncRNADisease [14], the recently published pan-cancer lncRNA co-expression atlas LncMAP [15] and the Mammal ncRNA Disease Repository (MNDR) that stores 3213 mammalian lncRNAs associated with diseases [16]. LNCipedia offers a complete set of human lncRNAs without compromising the quality of the annotations An example of this is the high-confidence gene set introduced in LNCipedia 3 [20] as a subset of the database with lncRNAs that lack coding potential by any metric. Following the release of valuable resources such as FANTOM CAT [21], we expanded our database with new lncRNAs. In addition to several small improvements, we introduced an improved filtering pipeline and support for official HGNC gene names. 23% of the transcripts and 6% of the genes in LNCipedia are annotated with an official gene symbol

Literature annotation provides insights into functions
Findings
CONCLUSION AND FUTURE PERSPECTIVES
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.