Abstract
BackgroundAccurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets.ResultsResults for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty.ConclusionThe fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search.
Highlights
Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison
It is clear that the proposed pairwise statistical significance using multiple parameter sets performs significantly better than BLAST and PSI-BLAST at all error levels, comparable to SSEARCH at low error levels, and significantly better than SSEARCH at higher error levels
The results show that PSI-BLAST gave poorer performance than pairwise statistical significance using multiple parameter sets, even with position-specific scoring matrices (PSSMs) constructed against the benchmark CATH database used in our experiments
Summary
Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Local sequence alignment plays a major role in the analysis of DNA and protein sequences [1,2,3]. It is the basic step of many other applications like detecting homology, finding protein structure and function, deciphering evolutionary relationships, etc. Since the alignment score distribution depends on various factors like alignment program, scoring scheme, sequence lengths, sequence compositions [10], it implies that it is possible to have two alignments of different sequence pairs with scores x and y with x
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.