Sequence Comparison Tools Research Articles

BackgroundControl of porcine reproductive and respiratory syndrome (PRRS) represents a tremendous challenge. The trend is now toward managing the disease collectively. In Quebec, area and regional control and elimination (ARC&E) initiatives started in 2011; diagnostic testing, including ORF5 sequencing, and sharing of information among stakeholders are largely promoted. At the provincial level, a data-sharing agreement was signed by Quebec swine practitioners allowing PRRS virus (PRRSV) sequences to be transferred to a database maintained by the Laboratoire d’épidémiologie et de médecine porcine (LEMP-DB). Several interactive tools were developed and are available to veterinarians to allow comparison of PRRSV ORF5 sequences within ARC&E projects or provincially while managing confidentiality issues.ResultsBetween January 1st 2010 and December 31st 2018, 4346 PRRSV ORF5 sequences were gathered into the LEMP-DB, involving 1254 sites and 43 practicing veterinarians. Approximately 34% of the submissions were from ARC&E projects. Using a novel web-based sequence comparison tool, each veterinarian has access to information on his/her client sequences and can compare each sequence with 1) commercial vaccine strains, 2) historical samples from the same site, and 3) all sequences submitted to the database over the last 4 years. Newly introduced PRRSV into breeding herds can be monitored using a new sequence comparison tool based on comparison of sequences at the provincial level. Each month, graphs providing the number of introductions per month and the yearly cumulative are updated. Between August 1st 2014 and December 31st 2018, 233 introductions were detected on 180 different breeding sites. Following a data-sharing agreement, veterinarians involved in ARC&E projects have access to an interactive mapping tool to locate pig sites, compare sequence similarity between participating sites and visualize the results on the map.ConclusionsThe structure developed in Quebec to collect, analyse and share sequencing data was efficient to provide useful information to the swine industry at both provincial and regional levels while dealing with confidentiality issues.

The use of $k$-word matches was developed as a fast alignment-free comparison method for DNA sequences in cases where long range contiguity has been compromised, for example, by shuffling, duplication, deletion or inversion of extended blocks of sequence. Here we extend the algorithm to amino acid sequences. We define a new statistic, the weighted word match, which reflects the varying degrees of similarity between pairs of amino acids. We computed the mean and variance, and simulated the distribution function for various forms of this statistic for sequences of identically and independently distributed letters. We present these results and a method for choosing an optimal word size. The efficiency of the method is tested by using simulated evolutionary sequences, and the results compared with BLAST. References R. A. Lippert, H. Huang, and M. S. Waterman. Distributional regimes for the number of $k$-word matches between two random sequences. Proc. Natl. Acad. Sci. USA, 99(22):13980--9, 2002. doi:10.1073/pnas.202468099 J. Jing, C. J. Burden, S. Foret, and S. R. Wilson. Statistical considerations underpinning an alignment-free sequence comparison method. J. Korean Stat. Soc., 39:325--335, 2010. doi:10.1016/j.jkss.2010.02.009 S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25(17):3389--402, 1997. doi:10.1093/nar/25.17.3389 W. J. Ewens and G. R. Grant. Statistical Methods in Bioinformatics: an Introduction. Springer, 2nd edition, 2005. S. Foret, M. R. Kantorovitz, and C. J. Burden. Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences. BMC Bioinformatics, 7 Suppl 5:S21, 2006. doi:10.1186/1471-2105-7-S5-S21 S. Henikoff and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89:10915--10919, 1992. doi:10.1073/pnas.89.22.10915 http://bioinfo.lifl.fr/reblosum/ [31 May 2011] G. Reinert, D. Chew, F. Sun, and M. S. Waterman. Alignment-free sequence comparison (i): statistics and power. J. Comput. Biol., 16(12):1615--1634, 2009. doi:10.1089/cmb.2009.0198 S. Foret, S. R. Wilson, and C. J. Burden. Empirical distribution of $k$-word matches in biological sequences. Pattern Recogn., 42:539--548, 2009. doi:10.1016/j.patcog.2008.06.026 S. Foret, S. R. Wilson, and C. J. Burden. Characterizing the $D2$ statistic: Word matches in biological sequences. Stat. Appl. Genet. Mo. B., 8(1):Article 43, 2009. doi:10.2202/1544-6115.1447 M. R. Kantorovitz, H. S. Booth, C. J. Burden, and S. R. Wilson. Asymptotic behavior of $k$-word matches between two uniformly distributed sequences. J. Appl. Probab., 44:788--805, 2006. doi:10.1239/jap/1189717545 T. J. Wu, Y. H. Huang, and L. A. Li. Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences. Bioinformatics, 21(22):4125--32, 2005. doi:10.1093/bioinformatics/bti658 S. Q. Le and O. Gascuel. An improved general amino acid replacement marix. Mol. Biol. Evol., 25:1307--1320, 2008. doi:10.1093/molbev/msn067 E. Gazave, P. Lapebi, G. S. Richards, F. Brunet, A. V. Ereskovsky, B. M. Degnan, C. Borchiellini, M. Vervoort, and E. Renard. Origin and evolution of the Notch signalling pathway: an overview from eukaryotic genomes. BMC Evol. Biol., 9:249, 2009. doi:10.1186/1471-2148-9-249 S. Q. Schneider, J. R. Finnerty, and M. Q. Martindale. Protein evolution: structure-function relationships of the oncogene Beta-catenin in the evolution of multicellular animals. J. Exptl. Zool. (Mol. Dev. Evol.), 295B:25--44, 2003. doi:10.1002/jez.b.00006

Sequence Comparison Tools Research Articles

Related Topics

Articles published on Sequence Comparison Tools

Graph-based analysis of DNA sequence comparison in closed cotton species: A generalized method to unveil genetic connections.

Family-Free Genome Comparison.

Fast and robust metagenomic sequence comparison through sparse chaining with skani

UniAligner: a parameter-free framework for fast sequence alignment.

Development of a Sequence Searchable Database of Celiac Disease-Associated Peptides and Proteins for Risk Assessment of Novel Food Proteins.

Parallel Fine-Grained Comparison of Long DNA Sequences in Homogeneous and Heterogeneous GPU Platforms With Pruning

ADACT: a tool for analysing (dis)similarity among nucleotide and protein sequences using minimal and relative absent words.

Using Multiple Fickett Bands to Accelerate Biological Sequence Comparisons

Porcine reproductive and respiratory syndrome virus: web-based interactive tools to support surveillance and control initiatives

BASTA – Taxonomic classification of sequences and sequence bins using last common ancestor estimations

Clustering of multi-domain protein sequences.

Quantifying the Number of Independent Organelle DNA Insertions in Genome Evolution and Human Health.

Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins

20D-dynamic Representation of Protein Sequences

When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes

A Small-Group Activity Introducing the Use and Interpretation of BLAST

Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy.

Consensus Decision for Protein Structure Classification

Comparative analysis of a cryptic thienamycin-like gene cluster identified in Streptomyces flavogriseus by genome mining

Weighted k-word matches: a sequence comparison tool for proteins

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sequence Comparison Tools Research Articles

Related Topics

Articles published on Sequence Comparison Tools

Graph-based analysis of DNA sequence comparison in closed cotton species: A generalized method to unveil genetic connections.

Family-Free Genome Comparison.

Fast and robust metagenomic sequence comparison through sparse chaining with skani

UniAligner: a parameter-free framework for fast sequence alignment.

Development of a Sequence Searchable Database of Celiac Disease-Associated Peptides and Proteins for Risk Assessment of Novel Food Proteins.

Parallel Fine-Grained Comparison of Long DNA Sequences in Homogeneous and Heterogeneous GPU Platforms With Pruning

ADACT: a tool for analysing (dis)similarity among nucleotide and protein sequences using minimal and relative absent words.

Using Multiple Fickett Bands to Accelerate Biological Sequence Comparisons

Porcine reproductive and respiratory syndrome virus: web-based interactive tools to support surveillance and control initiatives

BASTA – Taxonomic classification of sequences and sequence bins using last common ancestor estimations

Clustering of multi-domain protein sequences.

Quantifying the Number of Independent Organelle DNA Insertions in Genome Evolution and Human Health.

Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins

20D-dynamic Representation of Protein Sequences

When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes

A Small-Group Activity Introducing the Use and Interpretation of BLAST

Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy.

Consensus Decision for Protein Structure Classification

Comparative analysis of a cryptic thienamycin-like gene cluster identified in Streptomyces flavogriseus by genome mining

Weighted k-word matches: a sequence comparison tool for proteins