Saliva is often the specimen of choice when selecting a non-invasive sample type. Properly collected, saliva is more stable and convenient to ship than blood. However, DNA extracted from saliva is notorious for its bacterial contamination, with bacterial content as high as 50% in some collections. While a high percentage of bacterial DNA may not interfere with many genetic tests, it constitutes an obstacle for genome sequencing (GS) by consuming valuable sequencing space. We aimed at designing and implementing cost-effective processes that would enable us to offer PCR-free GS using saliva-derived DNA as a clinical test. Saliva specimens were provided by volunteers from HudsonAlpha Institute of Biotechnology after written informed consent was obtained. Saliva specimens were collected with Oragene OGD-600 and OGD-675 kits (DNAgenotek). DNA was extracted with the QIAsymphony SP instrument. Libraries for PCR-free GS were generated using 0.5-1.5 μg of saliva-derived DNA with the TruSeq DNA PCR-free kit and IDT for Illumina TruSeq DNA Indices (Illumina). Sequencing was performed using the NovaSeq 6000 System, S4 flowcell, and v1.5 reagent kit. Data processing was performed with an in-house secondary pipeline that utilizes BWA-MEM (via Sentieon) for alignment, strelka2 for variant calling, and kraken2 for measuring percent human DNA versus percent bacterial contamination. Tertiary analysis was performed using an in-house tool, Codicem. Several development and optimization experiments were performed to assess: 1) input requirements for PCR-free GS from saliva-derived DNA; 2) correlation between percent aligned reads and percent human DNA; and 3) genome coverage requirements needed to accurately assess bacterial contamination. We have observed that an input of 0.5 - 1 μg can be used for saliva-derived DNA, consistent with inputs for blood-derived DNA. We also found that an initial low coverage run (1X) for saliva-derived DNA could be used to estimate bacterial contamination of a sample, and our analysis pipeline was able to accurately estimate percent human DNA at a coverage as low as 0.5X. The percent aligned reads differed largely between individuals due to high variability in bacterial DNA contamination. As expected, percent aligned reads correlated with percent human DNA (R2=0.998). The estimation of bacterial content could then be used to facilitate the most efficient sample processing by informing the amount of library required to achieve the desired coverage. We further investigated whether a different collection method of saliva allowed us to improve sequencing efficiency. A sponge collection kit (OGD-675) showed a substantial reduction in bacterial contamination with increase in percent human DNA and percent reads aligned compared to a spit collection kit (OGD-600). On average, human DNA content was 95.44% in sponge kits compared to 85.15% in spit kits. Furthermore, the distribution of percent aligned reads in GS data obtained from saliva specimens collected with sponge kits was much narrower than that for the samples collected with the spit kit (99.46-83.47% compared to 97.09-46.44%). Despite smaller collection volumes in sponge kits and subsequent lower total DNA yields (average of 7.8 μg in sponge kits compared to 12 μg in spit kits), we observed DNA amounts extracted from sponge kits to be sufficient for primary and confirmatory testing. A supplementary stability study performed using DNA isolated 0 and 31 days after collection showed that bacterial contamination did not significantly increase over time for either sponge or spit collection kits (average increase from 3.01% to 3.56% in sponge kits and from 10.27% to 12.31% in spit kits). However, we noted that percentage aligned reads decreased more during the 30 days of storage in samples with high bacterial contamination at initial extraction, with up to 8% increase in bacterial contamination observed for samples with initial human DNA content less than 60%. Consequently, we observed a narrower distribution of percentage aligned read changes over time in DNA isolated from the sponge collection kits (0.01-1.58% compared to 0.1-8.8%). HudsonAlpha Clinical Services Lab has validated PCR-free GS assay with saliva-derived DNA as an input. The workflow efficiency is increased due to the implementation of a preliminary sequencing to 1X coverage as a method to estimate bacterial contamination. This step allows us to triage samples and eliminate failures due to insufficient coverage, as well as balance sequencing pools to achieve desired coverage. As a result of our studies, we found that a sponge saliva collection kit provides sufficient DNA yields with relatively low bacterial contamination, making the sponge kit the preferred collection method for GS applications.
Read full abstract