Abstract

BackgroundAccurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets.ResultsResults for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty.ConclusionThe fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search.

Highlights

  • Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison

  • It is clear that the proposed pairwise statistical significance using multiple parameter sets performs significantly better than BLAST and PSI-BLAST at all error levels, comparable to SSEARCH at low error levels, and significantly better than SSEARCH at higher error levels

  • The results show that PSI-BLAST gave poorer performance than pairwise statistical significance using multiple parameter sets, even with position-specific scoring matrices (PSSMs) constructed against the benchmark CATH database used in our experiments

Read more

Summary

Introduction

Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Local sequence alignment plays a major role in the analysis of DNA and protein sequences [1,2,3]. It is the basic step of many other applications like detecting homology, finding protein structure and function, deciphering evolutionary relationships, etc. Since the alignment score distribution depends on various factors like alignment program, scoring scheme, sequence lengths, sequence compositions [10], it implies that it is possible to have two alignments of different sequence pairs with scores x and y with x

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.