The large tumor antigen (LT-Ag) and major capsid protein VP1 are known to play important roles in determining the host-specific infection properties of polyomaviruses (PyVs). The objective of this study was to investigate the physicochemical properties of amino acids of LT-Ag and VP1 that have important effects on host specificity, as well as classification techniques used to predict PyV hosts. We collected and used reference sequences of 86 viral species for analysis. Based on the clustering pattern of the reconstructed phylogenetic tree, the dataset was divided into three groups: mammalian, avian, and fish. We then used random forest (RF), naïve Bayes (NB), and k-nearest neighbors (kNN) algorithms for host classification. Among the three algorithms, classification accuracy using kNN was highest in both LT-Ag (ACC = 98.83) and VP1 (ACC = 96.51). The amino acid physicochemical property most strongly correlated with host classification was charge, followed by solvent accessibility, polarity, and hydrophobicity in LT-Ag. However, in VP1, amino acid composition showed the highest correlation with host classification, followed by charge, normalized van der Waals volume, and solvent accessibility. The results of the present study suggest the possibility of determining or predicting the host range and infection properties of PyVs at the molecular level by identifying the host species of active and emerging PyVs that exhibit different infection properties among diverse host species. Structural and biochemical differences of LT-Ag and VP1 proteins in host species that reflect these amino acid properties can be considered primary factors that determine the host specificity of PyV.
Read full abstract