Abstract

BackgroundEfforts to resolve the transcribed sequences in the equine genome have focused on protein-coding RNA. The transcription of the intergenic regions, although detected via total RNA sequencing (RNA-seq), has yet to be characterized in the horse. The most recent equine transcriptome based on RNA-seq from several tissues was a prime opportunity to obtain a concurrent long non-coding RNA (lncRNA) database.ResultsThis lncRNA database has a breadth of eight tissues and a depth of over 20 million reads for select tissues, providing the deepest and most expansive equine lncRNA database. Utilizing the intergenic reads and three categories of novel genes from a previously published equine transcriptome pipeline, we better describe these groups by annotating the lncRNA candidates. These lncRNA candidates were filtered using an approach adapted from human lncRNA annotation, which removes transcripts based on size, expression, protein-coding capability and distance to the start or stop of annotated protein-coding transcripts.ConclusionOur equine lncRNA database has 20,800 transcripts that demonstrate characteristics unique to lncRNA including low expression, low exon diversity and low levels of sequence conservation. These candidate lncRNA will serve as a baseline lncRNA annotation and begin to describe the RNA-seq reads assigned to the intergenic space in the horse.

Highlights

  • Efforts to resolve the transcribed sequences in the equine genome have focused on protein-coding RNA

  • There are, over 4,000 long non-coding RNA (lncRNA) transcripts predicted by ENSEMBL and NCBI represented in our transcriptome that we have considered as input for our annotation pipeline

  • Input categories of reads The initial inputs into this lncRNA pipeline were direct products of the transcriptome annotation pipeline based on RNA sequencing (RNA-seq) from eight equine tissues: brainstem, cerebellum, spinal cord, retina, skeletal muscle, skin and embryo inner cell mass (ICM) and TE, originating from 59 horses [10]

Read more

Summary

Introduction

Efforts to resolve the transcribed sequences in the equine genome have focused on protein-coding RNA. LncRNA are often found in low abundance compared to protein-coding genes [4] and exhibit shorter transcript sizes and less exon diversity [5] Due to their low sequence conservation across species [6], their tissuespecific nature within species [7], and a lack of knowledge regarding their function, lncRNA are difficult to identify and validate. They have been shown to exhibit more variability in expression than protein-coding genes [8] and the number of lncRNA detected is affected and increases when more individuals are used to formulate the lncRNA database [9]. Having transcript expression profiles from several tissues collected from multiple individuals is paramount in detecting the maximum number of lncRNA

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call