Abstract

Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.

Highlights

  • The development of new techniques of high-throughput sequencing and the large amount of sequencing projects have been creating enormous volumes of biological data [1], which have been revealing an increasingly number of non-coding RNAs in eukaryotic genomes [2]

  • Between two distinct non-coding RNAs (ncRNAs) classifications we have (a) small RNAs, those well-known structured RNAs with lengths between 20 and 30 nucleotides, and (b) long ncRNAs, those presenting more than 200 nucleotides and a poor capacity to code proteins, which represent the least understood transcripts today [4,5,6]

  • No sugarcane genome is publicly available in biological databases, and no information about its lincRNAs is known

Read more

Summary

Introduction

The development of new techniques of high-throughput sequencing and the large amount of sequencing projects have been creating enormous volumes of biological data [1], which have been revealing an increasingly number of non-coding RNAs (ncRNAs) in eukaryotic genomes [2]. These ncRNAs directly act in the cellular structures, as well as in catalytic and regulatory processes [3].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call