The role of bio-informatics and computational biology in life sciences has been growing ever since the emergence of complex and large datasets for understanding the biological processes and expression of traits. Even though, algorithmic approaches are available, individual modules have to be executed for each intermediate result to predict the microRibosomal Nucleic Acid (miRNA). Hence, an attempt was made to develop an integrated model for predicting the miRNA in which all the structures will be generated automatically, once we submit the genomic sequences with varied datasets as an input. A novel algorithm was developed for prediction of miRNA in plants using shell scripting for fast processing of huge amount of data. As a part of the pipeline, software modules for generating RNA secondary structure, RNA structure in Extensible Markup Language(XML) format and RNA structure in pictorial view were developed using shell scripting by imposing various constraints, namely(1) miRNA should be a part of hairpin, (2) miRNA length is approximately 21nt, (3) it should start from 41st position and (4) the length of hairpin of good miRNA is >50 nt. Built-in modules, namely ‘samtools’ and ‘mfold’ were used in the scripting for generating RNA secondary structure in graphical form and in XML format. These modules were executed with the representative tobacco genome survey sequence and able to retrieve the above structures which are considered an input for predicting miRNA, and an output file was generated to display good miRNA sequences from the given structure. This algorithm can be used for predicting miRNA from the genomic sequences from the upcoming tobacco and other plantgenome projects.
Read full abstract