Abstract
To develop an improved model for the genetic basis of reduced susceptibility to tenofovir in vitro. A dataset of 532 HIV-1 subtype B reverse transcriptase genotypes for which matched phenotypic susceptibility data were available was assembled, both as a continuous (transformed) dataset and a categorical dataset generated by imposing a cut-off on the basis of earlier studies of in-vivo response of 1.4-fold. Models were generated using stepwise regression, decision tree and random forest approaches on both the continuous and categorical data. Models were compared by mean squared error (continuous models), or by misclassification rates by nested crossvalidation. From the continuous dataset, stepwise linear regression, regression tree and regression forest methods yielded models with MSE of 0.46, 0.48 and 0.42 respectively. Amino acids 215, 65, 41, 67, 184 and 151 in HIV-1 reverse transcriptase were identified in all three models and amino acid 210 in two. The categorical data yielded logistic regression, classification tree and forest models with misclassification rates of 26, 24 and 23%, respectively. Amino acids 215, 65 and 67 appeared in all; 41, 184, 210 and 151 were also included in the classification forest model. The random forests approach has yielded a substantial improvement in the available models to describe the genetic basis of reduced susceptibility to tenofovir in vitro. The most important sites in these models are amino acid sites 215, 65, 41, 67, 184, 151 and 210 in HIV-1 reverse transcriptase.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have