Abstract

Abstract A growing body of evidence has indicated that microRNAs play important roles in many cellular processes and dysregulation of microRNAs results in several diseases including cancers. With advances in massive parallel sequencing technology, Next-Generation Sequencing (NGS), detection of low-quantity sequences improves and hence achieves better ability to find microRNAs expressed in low numbers. We construct a small RNA pipeline to identify novel microRNAs by miRDeep2 and other tools, which can be fully performed on Galaxy with friendly graphical user interface. After the construction of pipeline and tools, several proper small RNA-seq datasets of breast cancer containing enough samples statistically are derived for novel microRNA identification. MiRDeep2 is an integrated tool which uses a probabilistic model of microRNA biogenesis to score pattern and frequency of sequenced RNA with the secondary structure of the microRNA precursor in order to discover novel microRNAs. The method of finding microRNA using miRDeep2 has been widely used and has been validated with remarkable accuracy. Due to the long time and large space-consuming computing and multiple tools involved when processing NGS data, a bio-info framework made by USCS, Galaxy, is introduced into the pipeline, which serves as a great data integration and analysis platform. As the results, few hundreds of candidates are scored over 0 by miRDeep2. However, judging from the number and the probability of the true positives of being a novel microRNA, we set a score lower bound of 5 and use the 50 predicted candidates that pass the threshold in further analysis, which means we have 39±3 true positives out of 50 with probability of 78±7%. Then, candidates will be discarded when mapped to other functional RNAs by BLAST. Finally, 30 candidates of novel microRNAs are found out. A further investigation shows that 25 of 30 candidates are successfully mapped to another RNA-seq dataset including over a hundred samples of breast cancer and covering all tumor, tumor-adjacent, and normal type of tissue samples. Profiles of these candidates generated by miRDeep2 potentially show the trend of correlation with the tissue type of sample. More RNA-seq datasets can be performed following the pipeline set up in this study for microRNA characterization and in vitro experiments can be designed for further verification and profiling in the future. Citation Format: Chien-Yueh Lee, Liang-Bo Wang, Mong-Hsun Tsai, Liang-Chuan Lai, Eric Y. Chuang. Identification of novel miRNAs in breast data of the next generation sequencing using miRDeep2 and Galaxy. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 2903. doi:10.1158/1538-7445.AM2013-2903 Note: This abstract was not presented at the AACR Annual Meeting 2013 because the presenter was unable to attend.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call