Abstract

BackgroundDNA repeats, such as transposable elements, minisatellites and palindromic sequences, are abundant in sequences and have been shown to have significant and functional roles in the evolution of the host genomes. In a previous study, we introduced the concept of a repeat DNA module, a flexible motif present in at least two occurences in the sequences. This concept was embedded into ModuleOrganizer, a tool allowing the detection of repeat modules in a set of sequences. However, its implementation remains difficult for larger sequences.ResultsHere we present Visual ModuleOrganizer, a Java graphical interface that enables a new and optimized version of the ModuleOrganizer tool. To implement this version, it was recoded in C++ with compressed suffix tree data structures. This leads to less memory usage (at least 120-fold decrease in average) and decreases by at least four the computation time during the module detection process in large sequences. Visual ModuleOrganizer interface allows users to easily choose ModuleOrganizer parameters and to graphically display the results. Moreover, Visual ModuleOrganizer dynamically handles graphical results through four main parameters: gene annotations, overlapping modules with known annotations, location of the module in a minimal number of sequences, and the minimal length of the modules. As a case study, the analysis of FoldBack4 sequences clearly demonstrated that our tools can be extended to comparative and evolutionary analyses of any repeat sequence elements in a set of genomic sequences. With the increasing number of sequences available in public databases, it is now possible to perform comparative analyses of repeated DNA modules in a graphic and friendly manner within a reasonable time period.AvailabilityVisual ModuleOrganizer interface and the new version of the ModuleOrganizer tool are freely available at: http://lcb.cnrs-mrs.fr/spip.php?rubrique313.

Highlights

  • DNA repeats, such as transposable elements, minisatellites and palindromic sequences, are abundant in sequences and have been shown to have significant and functional roles in the evolution of the host genomes

  • Data-processing improvements During the run process of the previous algorithm, positions of the selected maximal repeats (MR) are first copied in the computer memory space before the building of all potential modules

  • In order to improve the efficiency of ModuleOrganizer memory usage, the new algorithm directly reads the MR positions through the compressed suffix tree data structure

Read more

Summary

Results

We present Visual ModuleOrganizer, a Java graphical interface that enables a new and optimized version of the ModuleOrganizer tool. To implement this version, it was recoded in C++ with compressed suffix tree data structures. It was recoded in C++ with compressed suffix tree data structures This leads to less memory usage (at least 120-fold decrease in average) and decreases by at least four the computation time during the module detection process in large sequences. With the increasing number of sequences available in public databases, it is possible to perform comparative analyses of repeated DNA modules in a graphic and friendly manner within a reasonable time period.

Background
Results and discussion
Conclusion
Bigot Y
23. Gish W
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call