Abstract

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Public health emergencies like SARS, MERS, and COVID-19 have prioritized surveillance of zoonotic coronaviruses, resulting in extensive genomic characterization of coronavirus diversity in bats. Sequencing viral genomes directly from animal specimens remains a laboratory challenge, however, and most bat coronaviruses have been characterized solely by PCR amplification of small regions from the best-conserved gene. This has resulted in limited phylogenetic resolution and left viral genetic factors relevant to threat assessment undescribed. In this study, we evaluated whether a technique called hybridization probe capture can achieve more extensive genome recovery from surveillance specimens. Using a custom panel of 20,000 probes, we captured and sequenced coronavirus genomic material in 21 swab specimens collected from bats in the Democratic Republic of the Congo. For 15 of these specimens, probe capture recovered more genome sequence than had been previously generated with standard amplicon sequencing protocols, providing a median 6.1-fold improvement (ranging up to 69.1-fold). Probe capture data also identified five novel alpha- and betacoronaviruses in these specimens, and their full genomes were recovered with additional deep sequencing. Based on these experiences, we discuss how probe capture could be effectively operationalized alongside other sequencing technologies for high-throughput, genomics-based discovery and surveillance of bat coronaviruses. Editor's evaluation This work applies hybrid-capture sequencing for coronavirus (CoV) surveillance in bats. Given that bats are a major reservoir for animal-to-human virus spillover events, which have caused several major epidemics/pandemics, this is a very important field of research. The reported hybrid-capture method shows some clear advantages over amplicon-based viral sequencing, which is the established standard in the field. This new approach has clear merits that are well supported by the data presented and is likely to become an important tool in viral surveillance programs that ultimately aim to predict/prevent/prepare for future pandemics. The work will be of interest to microbiologists, particularly those studying viruses or interested in genomics surveillance. https://doi.org/10.7554/eLife.79777.sa0 Decision letter Reviews on Sciety eLife's review process Introduction Orthocoronavirinae, commonly known as coronaviruses (CoVs), are a diverse subfamily of RNA viruses that infect a broad range of mammals and birds (Corman et al., 2018; Ye et al., 2020; Ruiz-Aravena et al., 2021). Since the 1960s, four endemic human CoVs have been identified as common causes of mild respiratory illnesses (Corman et al., 2018; Ye et al., 2020). In the past two decades, additional CoV threats have emerged, most notably SARS-CoV, MERS-CoV, and SARS-CoV-2, causing severe disease, public health emergencies, and global crises (Drosten et al., 2003; Zaki et al., 2012; Hu et al., 2015; Corman et al., 2018; Ye et al., 2020; Zhou et al., 2020). These spill-overs have established CoVs alongside influenza A viruses as important zoonotic pathogens and pandemic threats. Indeed, evolving perceptions of CoV risk have led to speculation that some historical pandemics have been mis-attributed to influenza, and they may have in fact been the spill-overs of now-endemic human CoVs (Vijgen et al., 2005; Corman et al., 2018; Brüssow and Brüssow, 2021). Emerging CoV threats have motivated extensive viral discovery and surveillance activities at the interface between humans, livestock, and wildlife (Drexler et al., 2014; Frutos et al., 2021; Geldenhuys et al., 2021). Many of these activities have focused on bats (order Chiroptera). They are the second-most diverse order of mammals, following rodents, and they are a vast reservoir of CoV diversity (Drexler et al., 2014; Hu et al., 2015; Frutos et al., 2021; Geldenhuys et al., 2021; Ruiz-Aravena et al., 2021). Bats have been implicated in the emergence of SARS-CoV, MERS-CoV, SARS-CoV-2, and, less recently, the endemic human CoVs NL63 and 229E (Li et al., 2005; Pfefferle et al., 2009; Tong et al., 2009; Huynh et al., 2012; Corman et al., 2015; Hu et al., 2015; Yang et al., 2015; Tao et al., 2017; Ye et al., 2020; Zhou et al., 2020; Ruiz-Aravena et al., 2021). Genomic sequencing has been instrumental for characterizing CoV diversity and potential zoonotic threats, but recovering viral genomes directly from animal specimens remains a laboratory challenge. Host tissues and microbiota contribute excessive background genomic material to specimens, diluting viral genome fragments and vastly increasing the sequencing depth required for target detection and accurate genotyping. Consequently, laboratory methods for targeted enrichment of viral genome material have been necessary for practical, high-throughput sequencing of surveillance specimens (Houldcroft et al., 2017; Fitzpatrick et al., 2021). There are two major paradigms for targeted enrichment of genomic material. The first, called amplicon sequencing, uses PCR to amplify target genomic material. It is comparatively straightforward and sensitive, but PCR chemistry limits amplicon length and relies on the presence of specific primer sites across diverse taxa (Houldcroft et al., 2017; Fitzpatrick et al., 2021). In practice, extensive genomic divergence within viral taxa often constrains amplicon locations to the most conserved genes, limiting phylogenetic resolution (Drexler et al., 2014; Li et al., 2020). This also hinders characterization of viral genetic factors relevant for threat assessment like those encoding determinants of host range, tissue tropism, and virulence. These kinds of targets are often hypervariable due to strong evolutionary pressures from host adaptation and immune evasion, and consequently they do not have well-conserved locations for PCR primers. Due to these limitations, studies of CoV diversity have been almost exclusively based on small regions of the relatively conserved RNA-dependent RNA polymerase (RdRp) gene (Drexler et al., 2014; Geldenhuys et al., 2021). The second major paradigm for enriching viral genomic material is called hybridization probe capture. This method uses longer nucleotide oligomers to anneal and immobilize complementary target genomic fragments while background material is washed away. Probes are typically 80–120 nucleotides in length, making them more tolerant of sequence divergence and nucleotide mismatches than PCR primers (Brown et al., 2016). Probe panels are also highly scalable, allowing for the simultaneous capture of thousands to millions of target sequences. This has made them popular for applications where diverse and hypervariable viruses are targeted (Bonsall et al., 2015; Briese et al., 2015; O’Flaherty et al., 2018; Wylezich et al., 2021; Wylie et al., 2015). Probe capture has only been occasionally used to attempt sequencing of bat CoVs, however (Lim et al., 2019; Li et al., 2020). In this study, we evaluated hybridization probe capture for enriching CoV genomic material in oral and rectal swabs previously collected from bats. We designed a custom panel of 20,000 hybridization probes targeting the known diversity of bat CoVs. This panel was applied to 21 swab specimens collected in the Democratic Republic of the Congo (DRC), in which novel CoVs had been previously characterized by partial RdRp sequencing using standard amplicon methods (Kumakamba et al., 2021). We compared the extent of genome recovery by probe capture and amplicon sequencing, and we used probe capture data in conjunction with deep metagenomic sequencing to characterize full genomes for five novel alpha- and betacoronaviruses. Based on these experiences, we discuss how probe capture could be effectively operationalized alongside other targeted sequencing technologies for high-throughput, genomics-based discovery and surveillance of bat CoVs. Results Custom hybridization probe panel provided broad coverage in silico of known bat CoV diversity To begin this study, we designed a custom panel of hybridization probes targeting known bat CoV diversity. We obtained 4,852 bat CoV genomic sequences from GenBank, used them to design a custom panel of 20,000 probe sequences, then assessed in silico how extensively these reference sequences were covered by our custom panel (Figure 1A). For 90% of these bat CoV sequences, the custom panel covered at least 94.32% of nucleotide positions. We also evaluated probe coverage for the subset of these sequences representing full-length bat CoV genomes (Figure 1B), and 90% of these targets had at least 98.73% of their nucleotide positions covered. These results showed broad probe coverage of known bat CoV diversity at the time the panel was designed. Figure 1 Download asset Open asset Custom hybridization probe panel provided broadly inclusive coverage of known bat coronavirus diversity in silico. Bat coronavirus (CoV) sequences were obtained by downloading all available alphacoronavirus, betacoronavirus, and unclassified coronaviridae and coronavirinae sequences from GenBank on 4 October 2020 and searching for bat-related keywords in sequence headers. A custom panel of 20,000 probes was designed to target these sequences using the makeprobes module in the ProbeTools package. The ProbeTools capture and stats modules were used to assess probe coverage of bat CoV reference sequences. (A) Each bat CoV sequence is represented as a dot plotted according to its probe coverage, that is, the percentage of its nucleotide positions covered by at least one probe in the custom panel. (B) The same analysis was performed on the subset of sequences representing full-length genomes (>25 kb in length). Probe capture provided more extensive genome recovery than previous amplicon sequencing for most specimens We used our custom panel to assess probe capture recovery of CoV material in 25 metagenomic sequencing libraries. We prepared these libraries from a retrospective collection of 21 bat oral and rectal swabs that had been collected in DRC between 2015 and 2018 (Kumakamba et al., 2021). These swabs had been collected as part of the PREDICT project, a large-scale United States Agency for International Development (USAID) Emerging Pandemic Threats initiative that has collected over 20,000 animal specimens from 20 CoV hotspot countries (e.g. Anthony et al., 2017; Lacroix et al., 2017; Nziza et al., 2020; Valitutto et al., 2020; Ntumvi et al., 2022). Most libraries (n=19) were prepared from archived RNA that had been previously extracted from these specimens, although some libraries (n=6) were prepared from RNA that was freshly extracted from archived primary specimens (Table 1). CoVs had been previously detected in these specimens with PCR assays by Quan et al., 2010, and Watanabe et al., 2010. Sanger sequencing of these amplicons by Kumakamba et al., 2021, had generated partial RdRp sequences of 286 or 387 nucleotides, which had been used to assign these specimens to four novel phylogenetic groups of alpha- and betacoronaviruses (Table 1). Table 1 Bat specimens and sequencing libraries analysed in this study. Rectal and oral swabs collected for a previous study were used to evaluate hybridization probe capture (Kumakamba et al., 2021). For 19 swabs, archived RNA extracted during the previous study was assayed. For 6 swabs, freshly extracted RNA (using a conventional Trizol method) was assayed. As part of the previous study, Kumakamba et al. generated partial sequences from the RNA-dependent RNA polymerase gene, which were used to assign alpha- and betacoronaviruses in these specimens to four novel phylogenetic groups. Specimen IDLibrary IDHostSwab typeRNA extraction methodPhylogenetic groupCDAB0017RSVCDAB0017RSV-PREMicropteropus pusillusRectalPreviously extractedW-Beta-2CDAB0040RCDAB0040R-PREMyonycteris sp.RectalPreviously extractedW-Beta-2CDAB0040RSVCDAB0040RSV-PREMyonycteris sp.RectalPreviously extractedW-Beta-2CDAB0305RCDAB0305R-PREMicropteropus pusillusRectalPreviously extractedW-Beta-2CDAB0146RCDAB0146R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0158RCDAB0158R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0160RCDAB0160R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0173RCDAB0173R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0174RCDAB0174R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0203RCDAB0203R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0212RCDAB0212R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0217RCDAB0217R-PREEidolon helvumRectalPreviously extractedW-Beta-3CDAB0113RSVCDAB0113RSV-PREHipposideros cf. ruberRectalPreviously extractedW-Beta-4CDAB0486RCDAB0486R-PREChaerephon sp.RectalPreviously extractedQ-Alpha-4CDAB0488RCDAB0488R-PREMops condylurusRectalPreviously extractedQ-Alpha-4CDAB0488RCDAB0488R-TRIMops condylurusRectalTrizol re-extractionQ-Alpha-4CDAB0491RCDAB0491R-PREMops condylurusRectalPreviously extractedQ-Alpha-4CDAB0491RCDAB0491R-TRIMops condylurusRectalTrizol re-extractionQ-Alpha-4CDAB0492RCDAB0492R-PREMops condylurusRectalPreviously extractedQ-Alpha-4CDAB0492RCDAB0492R-TRIMops condylurusRectalTrizol re-extractionQ-Alpha-4CDAB0494OCDAB0494O-TRIMops condylurusOralTrizol re-extractionQ-Alpha-4CDAB0494RCDAB0494R-PREMops condylurusRectalPreviously extractedQ-Alpha-4CDAB0494RCDAB0494R-TRIMops condylurusRectalTrizol re-extractionQ-Alpha-4CDAB0495OCDAB0495O-PREMops condylurusOralPreviously extractedQ-Alpha-4CDAB0495RCDAB0495R-TRIMops condylurusRectalTrizol re-extractionQ-Alpha-4 We captured CoV genomic material in these metagenomic bat swab libraries with our custom probe panel then performed genomic sequencing (Table 2). To assess CoV recovery, we began with a strategy that would be suitable for automated bioinformatic analysis in high-throughput surveillance settings: sequencing reads from probe captured libraries were assembled de novo into contigs, then CoV sequences were identified by locally aligning contigs against a database of CoV reference sequences. In total, 113 CoV contigs were recovered from 17 of 25 libraries. We compared contig lengths to the partial RdRp amplicons that been previously generated for these specimens (Figure 2A). The protocol by Watanabe et al. had generated 387 nucleotide-long partial RdRp sequences, but median contig size with probe capture for these specimens was 696 nucleotides (IQR: 453–1051 nucleotides, max: 19,601 nucleotides). The protocol by Quan et al. had generated 286 nucleotide-long partial RdRp sequences, but median contig size with probe capture for these specimens was 602 nucleotides (IQR: 423–1053 nucleotides, max: 4240 nucleotides). Overall, 107 contigs (93.8%) were longer than the partial RdRp sequence previously generated for their specimen by standard amplicon sequencing protocols, demonstrating the capacity of probe capture to recover larger contiguous fragments of CoV genome sequence. We also assessed nucleotide sequence concordance; for specimens where the partial RdRp amplicon sequence was successfully assembled, nucleotide identities ranged from 99.3% to 100% (median = 100%, maximum two mismatches). Figure 2 Download asset Open asset De novo assembly of probe captured libraries yielded more genome sequence than standard amplicon sequencing methods for most specimens. Reads from probe captured libraries were assembled de novo with coronaSPAdes, and coronavirus contigs were identified by local alignment against a database of all coronaviridae sequences in GenBank. (A) The size distribution of contigs from all libraries is shown. Dots are coloured to indicate whether the length of the contig exceeded partial RNA-dependent RNA polymerase (RdRP) gene amplicons previously sequenced from these specimens. (B) Total assembly size and assembly N50 distributions for all libraries. (C) Each contig is represented as a dot plotted according to its length. Assembly N50 sizes and total assembly sizes are indicated by the height of their bars. Table 2 Sequencing metrics for probe captured libraries. Total reads and sequencing output were measured for each library. Raw metrics describe unprocessed FASTQ files directly from the sequencer. Valid metrics describe FASTQ files following pre-processing to trim adapters, trim trailing low-quality bases, remove index hops, and remove PCR chimeras. On-target metrics were estimated by mapping valid data to the coronavirus reference sequence selected for each specimen and to the contigs assembled from each specimen. Library IDRaw reads(#)Raw output (kb)Valid reads(#)Valid output(kb)Mapped reads(#)Mapped size(kb)CDAB0017RSV-PRE115,28014,225.847,609736236,7164919CDAB0040R-PRE37,9504708.9695121.100CDAB0040RSV-PRE373,25445,333.7176,78326,783.348,1367068.4CDAB0113RSV-PRE31,394426116,861287516,3022772.2CDAB0146R-PRE11,870152019323.718622.6CDAB0158R-PRE48,0146422.54189706.91548239.8CDAB0160R-PRE83,5249948.52513376.61363191.2CDAB0173R-PRE10,628140320634.420333.7CDAB0174R-PRE900,578118,821107,97917,525.482,67913,290.9CDAB0203R-PRE6,832,218849,284.91,186,186188,188.9456,47468,384.7CDAB0212R-PRE60,83875264158681.44152678.3CDAB0217R-PRE20,078,1422,617,427.38,946,9351,467,955.35,173,44881,5381.3CDAB0305R-PRE27,0543182.73971594.91787250CDAB0486R-PRE442,4565,8326.756,838937720,6873385.1CDAB0488R-PRE188,29424,6792913506.22867493.1CDAB0488R-TRI343,91645,867.184151381.282251346.2CDAB0491R-PRE791,12096,136.346,5097081.845,2896854.6CDAB0491R-TRI1,561,144204,995.4173,53329,280.5157,88926,421.2CDAB0492R-PRE3,453,456448,217.9277,17648,023.8185,66531,641.3CDAB0492R-TRI4,200,520518,837.7139,80420,074.793,44213,294.3CDAB0494O-TRI141,49418,980.929049.76011.3CDAB0494R-PRE82,36011,162.722400CDAB0494R-TRI95,92412,762.192.100CDAB0495O-PRE27,07437760000CDAB0495R-TRI470,85063,26788961440.986721399.2 Next, we used assembly size metrics to assess the extent to which these contigs represented complete genomes. The median total assembly size was 1724 nucleotides (IQR: 0–5834 nucleotides), while median assembly N50 size was 533 nucleotides (IQR: 0–908 nucleotides) (Figure 2B). This assembly size-based assessment of genome completeness had limitations, however. Some assembly sizes may have been understated by genome regions with comparatively low read coverage that failed to assemble. Conversely, other assembly sizes may have been overstated by redundant contigs resulting from forked assembly graphs, either due to genetic variation within the intrahost viral population or due to polymerase errors introduced during library construction and probe capture. For instance, the total assembly size for library CDAB0217R-PRE was 33,195 nucleotides, exceeding the length of the longest known CoV genome (Figure 2C). Another limitation of this analysis was that these assembly metrics provided no indication of which regions of the genome had been recovered. To address these limitations, we also applied a reference sequence-based strategy. We used the contigs to identify the best available CoV reference sequences for each of the four novel phylogenetic groups to which these specimens had been assigned. Sequencing reads from captured libraries were directly mapped to these reference sequences and the contigs we had assembled de novo were also locally aligned to them (Figure 3 and Figure 3—figure supplements 1–4). Based on these read mappings and contig alignments, we calculated for each library a breadth of reference sequence recovery, that is, the number of nucleotide positions in the reference sequence covered by either mapped sequencing reads or contigs (Figure 4A). The number of reads mapped to these reference sequences and contigs was also used to estimate on-target rates for these libraries (Table 2). Figure 3 with 4 supplements see all Download asset Open asset Coverage of reference sequences by probe captured libraries was used to assess extent and location of recovery. Reference sequences were chosen for each previously identified phylogenetic group (indicated in panel titles). Coverage of these reference sequences was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequences. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated in each reference sequence and shaded light grey. This figure shows the six libraries with the most extensive reference sequence coverage. Similar plots are provided as figure supplements for all libraries where any coronavirus sequence was recovered (Figure 3—figure supplements 1–4) . Figure 4 Download asset Open asset Probe captured libraries provided more extensive coverage of reference genomes than standard amplicon sequencing protocols for most specimens. Reference sequences were selected for the previously identified phylogenetic groups to which these specimens had been assigned by Kumakamba et al., 2021. (A) Coverage of these reference sequences was determined by mapping reads and aligning contigs from probe captured libraries. Each library is represented as a dot, and dots are coloured according to whether reference sequence coverage exceeded the length of the partial RNA-dependent RNA polymerase (RdRP) gene sequence that had been previously generated by amplicon sequencing. (B) The number of reference sequence positions covered by probe captured libraries was divided by the length of the partial RdRP amplicon sequences from these specimens. This provided the fold-difference in recovery between probe capture and standard amplicon sequencing methods. (C) Percent coverage of the spike and RdRP genes were calculated for each specimen. The median breadth of reference sequence recovery for all libraries was 2376 nucleotides (IQR: 306–9446 nucleotides). Most libraries (48%) represented specimens from phylogenetic group Q-Alpha-4, which had a median reference sequence recovery of 6497 nucleotides (IQR: 733–9802 nucleotides, max: 12,673 nucleotides). Phylogenetic group W-Beta-3 also accounted for a substantial fraction of libraries (32%), and although median reference sequence recovery was lower than for Q-Alpha-4 (2427 nucleotides), W-Beta-3 provided the libraries with the most extensive reference sequence recoveries (IQR: 780–19,286 nucleotides, max: 26,755 nucleotides). As a simple way to quantify differences in recovery of CoV genome sequence between probe capture and amplicon sequencing, we calculated the ratio between the breadth of reference sequence recovery and the length of the previously generated partial RdRp amplicon sequence for each library (Figure 4B). The median ratio was 6.1-fold (IQR: 0.8-fold to 33.0-fold), reaching a maximum of 69.1-fold. Probe capture recovery was greater for 18 of 25 libraries (72%), representing 15 of 21 specimens (71%). We also used reference sequence coverage to estimate the completeness of recovery for the RdRp and spike genes (Figure 4C). Overall, RdRp was more completely recovered than spike. Furthermore, following the overall extent of recovery trend observed in Figure 4A, recovery of RdRp and spike was more complete for viruses from phylogenetic group Q-Alpha-4 than the betacoronavirus groups, although multiple complete RdRp genes were recovered from both Q-Alpha-4 and W-Beta-3 groups. No complete spike genes were recovered. Probe capture recovery limited by in vitro sensitivity No CoV sequences were recovered from 4 of 25 libraries (representing three specimens), despite partial RdRp sequences being obtained from them previously. Furthermore, probe capture did not yield any complete CoV genomes, and many specimens displayed scattered and discontinuous reference sequence coverage (Figure 3—figure supplements 1–4). We considered two explanations for this result. First, CoV material in these libraries may not have been completely captured because they were not targeted by any probe sequences in the panel. Second, CoV material in these specimens may not have been incorporated into the sequencing libraries due to factors limiting in vitro sensitivity, for example, low prevalence of viral genomic material; suboptimal nucleic acid concentration and integrity in archived RNA and primary specimens; and library preparation reaction inefficiencies. First, we assessed in vitro sensitivity. To exclude missing probe coverage as a confounder in this analysis, we evaluated recovery of the previously sequenced partial RdRp amplicons. Since their sequences were known, we could assess probe coverage in silico and demonstrate whether these targets were covered by the panel. All partial RdRp amplicons had at least 95.3% of their nucleotide positions covered by the probe panel (Figure 5A), but this did not translate into extensive recovery. For 12 of 25 libraries, no part of the partial RdRp sequence was recovered, and full/nearly full recovery (>95%) of the partial RdRp sequence was achieved for only 7 of 25 libraries (Figure 5A). These results demonstrated that genome recovery had been limited by factors other than probe panel inclusivity. Figure 5 Download asset Open asset Recovery of coronavirus (CoV) genomic material was limited in vitro by method sensitivity. (A) Sensitivity was assessed by evaluating recovery of partial RNA-dependent RNA polymerase (RdRp) gene regions that had been previously sequenced in these specimens by amplicon sequencing. Probe coverage of partial RdRp sequences was assessed in silico to exclude insufficient probe design as an alternate explanation for incomplete recovery of these targets. (B) Input RNA concentration, RNA integrity numbers (RINs), and CoV genome abundance were measured for each specimen. The impact of these specimen characteristics on recovery by probe capture (as measured by reference sequence coverage) was assessed using Spearman’s rank correlation (test results stated in plots). An outlier was omitted from this analysis: RNA concentration for specimen CDAB0160R was recorded as 190 ng/μl, a value 4.7 SDs from the mean of the distribution. Next, we examined nucleic acid concentration and integrity, two specimen characteristics associated with successful library preparation. Median RNA integrity number (RIN) values and RNA concentrations for these specimens were low: 1.1 and 14 ng/μl respectively, as was expected from archived material (Figure 5B). To assess the impact of RIN and RNA concentration on probe capture recovery, we compared these specimen characteristics against breadth of reference sequence recovery from the corresponding libraries (Figure 5B). Weak monotonic relationships were observed, with lower RNA concentration and lower RIN values generally leading to worse genome recovery. This relationship was significant for RNA concentration (p=0.045, Spearman’s rank correlation), but not for RNA integrity despite trending towards significance (p=0.053, Spearman’s rank correlation). These weak associations suggested additional factors hindered recovery, for example, low prevalence of viral material or missing probe coverage for genomic regions outside the partial RdRp target. Using the previously generated partial RdRp sequences, we designed RT-qPCR assays to estimate CoV genome copies in these specimens (Figure 5B). The median abundance of viral material was 0.26 million genome copies/μl. There was a strong and significant monotonic relationship between viral abundance and extent of genome recovery (p<0.0001, Spearman’s rank correlation). Inclusivity of custom probe panel against CoV taxa in study specimens Next, we considered if blind spots in the probe panel had contributed to incomplete genome recovery from these specimens. This inquiry suffered a counterfactual problem: to assess whether the CoV taxa in our specimens were fully covered by our probe panel, we would need their complete genome sequences. We did not have their full genome sequences, however, because the probes did not recover them. Instead, we evaluated probe coverage of the reference sequences assigned to each phylogenetic group, assuming they were the available CoV sequences most similar to those in our specimens. Probe coverage was nearly complete for all reference sequences (Figure 6). Nonetheless, reference sequence recovery did not exceed 92.3% for any of these libraries, and complete spike genes were conspicuously absent (Figure 3, Figure 3—figure supplement 1, Figure 3—figure supplement 2, Figure 3—figure supplement 3, Figure 3—figure supplement 4). This included specimens like CDAB0203R-PRE, CDAB0217R-PRE, and CDAB0492R-PRE where recovery was otherwise extensive and contiguous, suggesting genomic material was sufficiently abundant and intact for sensitive library construction. These results indicated the presence of CoVs similar to bat CoV CMR704-P12 and Chaerephon bat corornavirus/Kenya/KY22/2006, except with novel spike genes that diverged from the spike genes of these reference sequences and all other CoVs described in GenBank. Figure 6 Download asset Open asset In silico assessment of probe panel coverage for reference genomes. Reference sequences were chosen for each previously identified phylogenetic group (indicated in panel titles). Blue profiles show the number of probes covering each nucleotide position along the reference sequence. Probe coverage, that is, the percentage of nucleotide positions covered by at least one probe, is stated in panel titles. Ambiguity nucleotides (Ns) are shaded in orange, and these positions wer

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call