Abstract
BackgroundAnimal genomes contain thousands of long noncoding RNA (lncRNA) genes, a growing subset of which are thought to be functionally important. This functionality is often mediated by short sequence elements scattered throughout the RNA sequence that correspond to binding sites for small RNAs and RNA binding proteins. Throughout vertebrate evolution, the sequences of lncRNA genes changed extensively, so that it is often impossible to obtain significant alignments between sequences of lncRNAs from evolutionary distant species, even when synteny is evident. This often prohibits identifying conserved lncRNAs that are likely to be functional or prioritizing constrained regions for experimental interrogation.ResultsWe introduce here LncLOOM, a novel algorithmic framework for the discovery and evaluation of syntenic combinations of short motifs. LncLOOM is based on a graph representation of the input sequences and uses integer linear programming to efficiently compare dozens of sequences that have thousands of bases each and to evaluate the significance of the recovered motifs. We show that LncLOOM is capable of identifying specific, biologically relevant motifs which are conserved throughout vertebrates and beyond in lncRNAs and 3′UTRs, including novel functional RNA elements in the CHASERR lncRNA that are required for regulation of CHD2 expression.ConclusionsWe expect that LncLOOM will become a broadly used approach for the discovery of functionally relevant elements in the noncoding genome.
Highlights
Animal genomes contain thousands of long noncoding RNA genes, a growing subset of which are thought to be functionally important
Conserved motifs in the sequence of the CHASERR long noncoding RNA (lncRNA) In order to test the ability of LncLOOM to identify conserved modules in sequences that are not amenable for BLAST comparison, we focused on CHASERR, a lncRNA that we recently characterized as being essential for mouse viability [28]
We show here that LncLOOM is capable of working with lncRNA and 3′UTR sequences, but importantly, it is directly applicable to other types of biological sequences for which our assumptions of motif order conservation are reasonable, such as protein sequences and sequences of DNA enhancer elements
Summary
Animal genomes contain thousands of long noncoding RNA (lncRNA) genes, a growing subset of which are thought to be functionally important. Tens of thousands of loci in the human genome encode long noncoding RNA (lncRNA) transcripts, which do not appear to code for functional proteins [1, 2] These genes evolve much faster than most mRNAs [3]: there are no known homologs of vertebrate lncRNAs outside of vertebrates, and only ~ 100 lncRNAs have detectable conservation between mammals and fish [4]. Even lncRNAs with detectable similarity across long evolutionary distances frequently exhibit drastic changes in their exon-intron structure and overall length, often through species-specific acquisition of transposable elements [4] These make it difficult to predict functionally important sequence elements by comparing lncRNAs from multiple species. More sensitive approaches are needed to confidently detect specific functional elements that have been evolutionarily conserved in orthologous lncRNAs in distantly related species
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.