The article presents an efficient multithreaded implementation of the modern 3SEQ algorithm for detecting recombinant genetic sequences, tested on viral genomes. The work was carried out within the framework of the project to create a domestic (Russian) web-platform (bioprojects.iis.nsk.su) for solving a wide range of problems related to data analysis in the field of bioinformatics, virology and epidemiology. A recombinant viral genome emerges when two different variants of virus genomes of the same species exchange their parts, which is possible in case of infection with both variants simultaneously. The emergence of recombinants is rare but important events in the context of virus evolution research. One of the most promising among the existing algorithms for searching for recombinants is 3SEQ, but the author's version works only in single-threaded mode. We implemented this algorithm with support for multithreaded computing and taking into account the dates of sample collection, which provided a significant increase in the computing speed. The developed software was used to search for recombinants in the samples of influenza A H1N1 (only PB2 segments from 2174 genomes were analyzed), Dengue fever (726 genomes), Ebola virus (865 genomes) and in two samples of SARS-CoV-2 coronavirus (776 and 2132 genomes). No recombinants were found for influenza A H1N1 (PB2 segment) and the first dataset on SARS-CoV-2 (variant from Russia), which is in agreement with the analysis of the same data by the RDP algorithm. For the second SARS-CoV-2 dataset (variants from the Siberian Federal District), the only recombinant present in the dataset was correctly found. 725 recombinants were found in Dengue fever viruses, with a recombination region length in the range from 50 to 1000 nucleotides. In Ebola viruses, the length of the recombination region was shorter – in 572 recombinants it was in the range of 50 to 100 nucleotides, and in 249 genomes – was less than 50 nucleotides.
Read full abstract