Abstract
Non-coding RNAs (ncRNAs) play vital roles in translation, splicing, RNA processing, RNA modification and regulation of gene expression. The advancement in ncRNA discovery is evolving along with the finding of new classes of ncRNAs and the invention of revolutionary sequencing platforms. High-throughput sequencing technologies greatly facilitate the study of small regulatory RNAs which are 20 to 30 nt in length. High-throughput sequencing data of 18-26 nt small RNA fragments are a mixture of small regulatory RNAs and degraded products from coding RNAs or ncRNAs. The proper choice of computational approaches in analyzing small RNA sequencing data is crucial for the dissection of small RNAs derived from distinct origins, for making discovery of new ncRNAs and for revealing embedded knowledge in these ncRNAs. To date, the development of computational approaches mostly focused on the discovery of microRNAs (miRNAs). Computational approaches which use small RNA sequencing data for the studies of other ncRNAs are much in need. This dissertation presents the development of novel bioinformatics approaches to analyze small RNA sequencing data and showed that the analyses have increased the understandings of Arabidopsis ncRNAs. In first part, by the use of abundant small RNA sequencing data from the public domain, a new bioinformatics approach was developed for the finding of trans-acting small interfering RNAs (ta-siRNAs), a new class of small regulatory RNAs. Different from that of other siRNAs, the biogenesis of ta-siRNAs is dependent on the cleavage directed by miRNAs. Moreover, most ta-siRNAs are clustered in 21-nt increments relative to the cleavage site. Based on this characteristic, this study developed the first computational algorithm which successfully recovered both known and novel Arabidopsis loci producing ta-siRNAs from complex small RNA sequencing data. A group of newly identified ta-siRNAs was produced by the cleavage directed by a ta-siRNA instead of by miRNAs as was reported previously. The results indicate the existence of a small RNA regulatory cascade initiated by miRNA-directed cleavage and followed by the consecutive production of ta-siRNAs. The second part focuses on the use of small RNA sequencing data in the annotation of small nucleolar RNAs (snoRNAs). Small RNAs from snoRNAs are often considered to be degraded products of snoRNAs and were filtered out without further analysis in previous studies. However, the analysis of Arabidopsis small RNA sequencing data revealed an enrichment of small RNAs at the termini of snoRNAs. With the use of this feature, this study developed a new method which was able to re-annotate known snoRNAs lacking well defined termini and to discover novel snoRNA species. The finding of new snoRNAs also supported that there are additional RNA modification sites on Arabidopsis ribosomal RNAs and spliceosomal small nuclear RNAs. This research demonstrates that, by combining pre-existing biological knowledge and appropriate mining approaches, small RNA sequencing data represent a wealth treasure for the studies of small regulatory RNAs as well as other ncRNAs.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.