AbstractPredictive analytics is crucial in precision medicine for personalized patient care. To aid in precision medicine, this study identifies a subset of genetic and clinical variables that can serve as predictors for classifying diseased tissues/disease types. To achieve this, experiments were performed on diseased tissues obtained from the L1000 dataset to assess differences in the functionality and predictive capabilities of genetic and clinical variables. In this study, the k‐means technique was used for clustering the diseased tissue types, and the multinomial logistic regression (MLR) technique was applied for classifying the diseased tissue types. Dimensionality reduction techniques including principal component analysis and Boruta are used extensively to reduce the dimensionality of genetic and clinical variables. The results showed that landmark genes performed slightly better in clustering diseased tissue types compared to any random set of 978 non‐landmark genes, and the difference is statistically significant. Furthermore, it was evident that both clinical and genetic variables were important in predicting the diseased tissue types. The top three clinical predictors for predicting diseased tissue types were identified as morphology, gender, and age of diagnosis. Additionally, this study explored the possibility of using the latent representations of the clusters of landmark and non‐landmark genes as predictors for an MLR classifier. The classification models built using MLR revealed that landmark genes can serve as a subset of genetic variables and/or as a proxy for clinical variables. This study concludes that combining predictive analytics with dimensionality reduction effectively identifies key predictors in precision medicine, enhancing diagnostic accuracy.
Read full abstract