RANDTRAN: Random transcriptome sequence generator that accounts for partition specific features in eukaryotic mRNA datasets

E A Borzov,M Yu Skoblov,M V Ivanov,A V Marakhonov,P B Drozdova,A V Baranova

doi:10.1134/s0026893314050021

Abstract

The generation of true random and pseudorandom control sequences is an important problem of computational biology. Available random sequence generators differ in underlying probabilistic models that often remain undisclosed to users. Random sequences produced by differing probabilistic models substantially differ in their outputs commonly used as baselines for evaluations of the motif frequencies. Moreover, modern bioinformatics studies often require generation of matching control transcriptome with emulated partitions into ORFs, 5'- and 3'-UTRs as well as the proportion of non-coding RNAs within model transcriptome rather than relatively simple continuous control sequences. Here we describe novel random sequence generating tool RANDTRAN that accounts for the length distribution of 5' and 3' non-translated regions in given transcriptome and the partition-specific di- and trinucleotide compositions in translated and non-translated regions. RANDRAN presents matching control transcriptomes in ready-to-use UCSC genome browser-compatible input files. These features may be useful for generating of control sequence sets for common types of computational analysis of various sequence motifs within various sets of RNA. RANDTRAN is available for free download at http://www.genereseairch.ru/images/Randtran.rar.

Full Text