Abstract

Modern DNA sequencing has instituted a new era in human cytomegalovirus (HCMV) genomics. A key development has been the ability to determine the genome sequences of HCMV strains directly from clinical material. This involves the application of complex and often non-standardized bioinformatics approaches to analysing data of variable quality in a process that requires substantial manual intervention. To relieve this bottleneck, we have developed GRACy (Genome Reconstruction and Annotation of Cytomegalovirus), an easy-to-use toolkit for analysing HCMV sequence data. GRACy automates and integrates modules for read filtering, genotyping, genome assembly, genome annotation, variant analysis, and data submission. These modules were tested extensively on simulated and experimental data and outperformed generic approaches. GRACy is written in Python and is embedded in a graphical user interface with all required dependencies installed by a single command. It runs on the Linux operating system and is designed to allow the future implementation of a cross-platform version. GRACy is distributed under a GPL 3.0 license and is freely available at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.

Highlights

  • Human cytomegalovirus (HCMV; species Human betaherpesvirus 5) infects 60–70 per cent of adults in developed countries and up to 100 per cent in developing countries (Zuhair et al 2019)

  • We have developed GRACy (Genome Reconstruction and Annotation of Cytomegalovirus), an easy-to-use toolkit for analysing human cytomegalovirus (HCMV) sequence data

  • Two of these simulated even coverage (EC) depth of the HCMV genome, and two simulated uneven coverage (UC) depth in order to resemble typical experimental data, which are highly influenced by local effects on the efficiency of target enrichment and polymerase chain reaction (PCR) amplification

Read more

Summary

Introduction

Human cytomegalovirus (HCMV; species Human betaherpesvirus 5) infects 60–70 per cent of adults in developed countries and up to 100 per cent in developing countries (Zuhair et al 2019). Recombination during HCMV evolution has essentially obliterated genetic linkage and generated a huge number of different strains (Rasmussen et al 2003; Sijmons et al 2015; Suarez et al 2019a) These aspects of diversity limit the effectiveness of reference-guided genome assembly and of automatic transfer of annotations from a reference. One approach to monitoring strain composition is to count the occurrences in the reads of a single sequence motif (21–24 nucleotides (nt)) that is specific to each genotype of a hypervariable gene and conserved in all known sequences of that genotype (Suarez et al 2019a,b). We intend GRACy to provide an easyto-use, expandable toolkit in support of HCMV genomics research

Datasets
Performance statistics
Software implementation
Read filtering
Genome assembly
Genotyping
Genome annotation
Variant analysis
Database submission
Variant calling
Annotation
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.