UShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

Minghui Jiang,James Anderson,Joel Gillespie,Martin Mayne

doi:10.1186/1471-2105-9-192

Abstract

BackgroundRandomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, k-let counts.ResultsWe present a sequence analysis tool (named uShuffle) for generating uniform random permutations of biological sequences (such as DNAs, RNAs, and proteins) that preserve the exact k-let counts. The uShuffle tool implements the latest variant of the Euler algorithm and uses Wilson's algorithm in the crucial step of arborescence generation. It is carefully engineered and extremely efficient. The uShuffle tool achieves maximum flexibility by allowing arbitrary alphabet size and let size. It can be used as a command-line program, a web application, or a utility library. Source code in C, Java, and C#, and integration instructions for Perl and Python are provided.ConclusionThe uShuffle tool surpasses existing implementation of the Euler algorithm in both performance and flexibility. It is a useful tool for the bioinformatics community.

Highlights

Shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence
Altschul and Erickson [2] presented the first algorithm for generating truly uniform random sequences that preserve either the doublet counts or the triplet counts or both; a crucial step of their algorithm for generating random arborescences depends on a trial-and-error procedure, which is a potential bottleneck in performance
We have performed two sets of experiments to test the performance of two major forms of the uShuffle tool: we first benchmark the performance of the uShuffle C library, compare the performance of the uShuffle Java applet with the shufflet program by Coward [11]

Summary

Introduction

Shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. It is known that the stability of an RNA secondary structure depends crucially on the stackings of adjacent base pairs; the frequencies of distinct doublets in the random sequences are important considerations in such analysis [4,25]. Biologists need sophisticated shuffling tools that preserve the counts of distinct letters and higher-order statistics such as doublet counts, triplet counts, and, in general, k-let counts

Objectives

Methods

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 11, 2008
Citations: 172	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

UShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Αλγόριθμοι διαχείρισης και ανάλυσης ακολουθιών βιολογικών δεδομένων με εφαρμογή σε προβλήματα βιοπληροφορικής
Αικατερίνη Περδικούρη
-
Αικατερίνη ΠερδικούρηΑικατερίνη Περδικούρη
01 Jan 2006
01 Jan 2006

Lee metric codes over integer residue rings (Corresp.)
C Satyanarayana
IEEE Transactions on Information Theory | VOL. 25
C SatyanarayanaC Satyanarayana
01 Mar 1979
IEEE Transactions on Information Theory | VOL. 25

Chapter 5 - A Framework for Detecting and Diagnosing Configuration Faults in Web Applications
Cyntrica Eaton ...
Advances In Computers | VOL. 86
Cyntrica Eaton, et. al.Cyntrica Eaton ...
01 Jan 2012
Advances In Computers | VOL. 86

Pegasys: software for executing and integrating analyses of biological sequences
Sohrab P Shah ... Jessica N Sawkins
BMC bioinformatics | VOL. 5
Sohrab P Shah, et. al.Sohrab P Shah ... Jessica N Sawkins
01 Jan 2004
BMC bioinformatics | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

UShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics