Introduction The Wnt (wingless-related integration site) signalling pathway is crucial for bone formation and remodelling, regulating the commitment of mesenchymal stem cells (MSCs) to the osteoblastic lineage. It triggers the transcriptional activation of Wnt target genes and promotes osteoblast proliferation and survival. Weighted co-expression network analysis (WGCNA) and differential gene expression analysis help researchers understand gene roles. Gradient boosting, a machine learning technique, enhances understanding of genetic and molecular mechanisms contributing to overlap genes, improving gene regulation and functional genomics. The aim is to predict overlapping genes in the Wnt signalling pathway. Methods Differential gene expression analysis was performed using the National Center for Biotechnology Information (NCBI) geo dataset-GSE251951, focusing on the effect of Wnt signaling on treatment. The WGCNA module was analyzed using the iDEP tool to identify interconnected gene clusters. Hub genes were identified by calculating module eigengenes, correlated with external traits, and ranked based on module membership values. The study utilized gradient boosting, an ensemble learning method, to predict models, evaluate their performance using metrics like accuracy, precision, recall, and F1 score, and adjust predictions based on gradient and learning rate. Results The dendrogram uses the "Dynamic TreeCut" algorithm to analyze gene clusters, aiding researchers in understanding gene modules and biological processes, identifying co-expressed genes, and discovering new pathways. The confusion matrix displays 88 actual and predicted cases. The gradient boosting model achieves 78.9% accuracy in predicting Wnt pathway overlapping genes, with a respectable area under the curve (AUC) and classification accuracy values. It accurately predicts 73.9% of samples, with a high precision ratio and low recall. Conclusion Future research should enhance differential expression analysis and WGCNA to identify key Wnt pathway genes, improve sensitivity, specificity, hyperparameter tuning, and validation experiments, and use larger datasets.
Read full abstract