Abstract
This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL.
Highlights
Computational analysis of biological sequences has became an extremely rich field of modern science and a highly interdisciplinary area, where statistical and algorithmic methods play a key role [1,2]
Sequence alignment tools have been at the hearth of this field for nearly 50 years and it is commonly accepted that the initial investigation of the mathematical notion of alignment and distance is one of the major contributions of S
We concentrate on alignment problems involving only two sequences
Summary
Computational analysis of biological sequences has became an extremely rich field of modern science and a highly interdisciplinary area, where statistical and algorithmic methods play a key role [1,2]. The first two functions allow to select a subset of strings from a given set and to assess its statistical significance via z-score computation [18]. The function allowing for the generation of a user-specified model organism gives, in a suitable format, all probabilistic information needed by the z-score function A detailed user manual, together with installation procedures, file formats etc., is given at the supplementary web site [25]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.