Abstract

The identification of homologous DNA is a fundamental buildingblock of comparative genomic and molecular evolution studies. To date, pairwiselocal sequence alignment methods have been the prevailing technique to identifyhomologous nucleotides. However, existing methods that identify and align allhomologous nucleotides in one or more genomes have suffered poor scalabilityand limited accuracy.We propose a novel method that couples a gapped extensionheuristic with a previously described efficient filtration method for local multiplealignment. During gapped extension, we use the MUSCLE implementation ofprogressive multiple alignment with iterative refinement. The resulting gappedextensions potentially contain alignments of unrelated sequence. We detectand remove such undesirable alignments using a hidden Markov model topredict the posterior probability of homology. The HMM emission frequenciesfor nucleotide substitutions can be derived from any strand/species-symmetric nucleotide substitution matrix, and we have developed a method to adapt anarbitrary substitution matrix (i.e. HOXD) to organisms with different G+Ccontent. We evaluate the performance of our method and previous approacheson a hybrid dataset of real genomic DNA with simulated interspersed repeats.Our method outperforms existing methods in terms of sensitivity, positivepredictive value, and localizing boundaries of homology. The described methodshave been implemented in the free, open-source procrastAligner software,available from: http://alggen.lsi.upc.es/recerca/align/procrastination.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.