Abstract

The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a ‘dark matter.’ We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase) were collected, processed, and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and 5 virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate 100s of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are effective in improving knowledge about the presence and prevalence of giant viruses in the environment and the human body.

Highlights

  • The first giant virus of amoeba, Mimivirus, was isolated in 2003 from a water sample by co-culturing on Acanthamoeba polyphaga, a strategy implemented to find Legionella-like bacteria (La Scola et al, 2003; Raoult et al, 2007)

  • The pipeline dedicated to the search for giant virus-related sequences in metagenomes comprises several scripts written in Python language and include independent modules (Figure 1)

  • MG-Digger, a user-friendly computational tool implemented in our laboratory for the detection of Megavirales-like or virophagelike sequences in metagenomes, automatically generated readyto-analyze metagenome files and annotated 100s of sequences as significantly matching those of giant viruses or virophages

Read more

Summary

Introduction

The first giant virus of amoeba, Mimivirus, was isolated in 2003 from a water sample by co-culturing on Acanthamoeba polyphaga, a strategy implemented to find Legionella-like bacteria (La Scola et al, 2003; Raoult et al, 2007). They comprised new viral families, including Mimiviridae (La Scola et al, 2008; Pagnier et al, 2013) and Marseilleviridae (Boyer et al, 2009; Colson et al, 2012b; Pagnier et al, 2013), and two new putative viral families including pandoravirus isolates (currently the largest known viruses; Philippe et al, 2013), and Pithovirus sibericum (Legendre et al, 2014) These giant viruses were related to the group of nucleocytoplasmic large DNA viruses (NCLDVs) described since 2001 as being composed of five viral families: Ascoviridae, Asfarviridae, Iridoviridae, Phycodnaviridae, and Poxviridae, whose members infect a wide variety of eukaryotic hosts (Iyer et al, 2001; Yutin et al, 2009). The size of these virions and their gene complements has changed our view of the viral world and its diversity, and has called into question the definition and classification of viruses (Raoult and Forterre, 2008; Colson et al, 2012a; Raoult, 2014)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.