Defective viral genomes (DVGs) are variants of the wild-type (wt) virus that lack the ability to complete autonomously an infectious cycle. However, in the presence of their parental (helper) wt virus, DVGs can interfere with the replication, encapsidation and spread of functional genomes, acting as a significant selective force in viral evolution. DVGs also affect the host's immune responses and are linked to chronic infections and milder symptoms. Thus, identifying and characterizing DVGs is crucial for understanding infection prognosis. Quantifying DVGs is challenging due to their inability to sustain themselves, which makes it difficult to distinguish them from the helper virus, especially using high-throughput RNA sequencing (RNA-seq). An accurate quantification is essential for understanding their very dynamical interactions with the helper virus. We present a method to simultaneously estimate the abundances of DVGs and wt genomes within a sample by identifying genomic regions with significant deviations from the expected sequencing depth. Our approach involves reconstructing the depth profile through a linear system of equations, which provides an estimate of the number of wt and DVG genomes of each type. Until now, in silico methods have only estimated the DVG-to-wt ratio for localized genomic regions. This is the first method that simultaneously estimates the proportions of wt and DVGs genome wide from short-reads RNA sequencing. The MATLAB code and the synthetic datasets are freely available at https://github.com/jmusan/wtDVGquantific.
Read full abstract