Abstract

HIV remains a critical global health issue, with an estimated 39.9 million people living with the virus worldwide by the end of 2023 (according to WHO). Although the epidemic’s impact varies significantly across regions, Africa remains the most affected. In the past decade, considerable efforts have focused on developing preventive measures, such as vaccines and pre-exposure prophylaxis, to combat sexually transmitted HIV. Recently, cytokine profiles have gained attention as potential predictors of HIV incidence due to their involvement in immune regulation and inflammation, presenting new opportunities to enhance preventative strategies. However, the high-dimensional, time-varying nature of cytokine data collected in clinical research, presents challenges for traditional statistical methods like the Cox proportional hazards (PH) model to effectively analyze survival data related to HIV. Machine learning (ML) survival models offer a robust alternative, especially for addressing the limitations of the PH model’s assumptions. In this study, we applied survival support vector machine (SSVM) and random survival forest (RSF) models using changes or means in cytokine levels as predictors to assess their association with HIV incidence, evaluate variable importance, measure predictive accuracy using the concordance index (C-index) and integrated Brier score (IBS) and interpret the model’s predictions using Shapley additive explanations (SHAP) values. Our results indicated that RSFs models outperformed SSVMs models, with the difference covariate model performing better than the mean covariate model. The highest C-index for SSVM was 0.7180 under the difference covariate model, while for RSF, it reached 0.8801 under the difference covariate model using the log-rank split rule. Key cytokines identified as positive predictors of HIV incidence included TNF-A, BASIC-FGF, IL-5, MCP-3, and EOTAXIN, while 29 cytokines were negative predictors. Baseline factors such as condom use frequency, treatment status, number of partners, and sexual activity also emerged as significant predictors. This study underscored the potential of cytokine profiles for predicting HIV incidence and highlighted the advantages of RSFs models in analyzing high-dimensional, time-varying data over SSVMs. It further through ablation studies emphasized the importance of selecting key features within mean and difference based covariate models to achieve an optimal balance between model complexity and predictive accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.