BackgroundOsteoarthritis (OA) is a common chronic joint disease. This study aimed to investigate possible OA diagnostic biomarkers and to verify their significance in clinical samples. MethodsWe exploited three datasets from the Gene Expression Omnibus (GEO) database, serving as the training set. We first determined differentially expressed genes and screened candidate diagnostic biomarkers by applying three machine learning algorithms (Random Forest, Least Absolute Shrinkage and Selection Operator logistic regression, Support Vector Machine-Recursive Feature Elimination). Another GEO dataset was used as the validation set. The test set consisted of RNA-sequenced peripheral blood samples collected from patients and healthy donors. Blood samples and chondrocytes were collected for quantitative real-time PCR to confirm expression levels. Receiver operating characteristic curves were generated for individual and combined biomarkers. ResultsIn total, 251 DEGs were screened, where B3GALNT1, SCRG1 and ZNF423 were screened by all three algorithms. The area under the curve (AUC) of various biomarkers in our test set did not reach as high as that in public datasets. GRB10 exhibited highest AUC of 0.947 in the training set but 0.691 in our test set, while the favorable combined model comprising B3GALNT1, GRB10, KLF9 and SCRG1 demonstrated an AUC of 0.986 in the training set, 1.000 in the validation set and 0.836 in our test set. ConclusionWe identified a combined model for early diagnosis of OA that includes B3GALNT1, GRB10, KLF9 and SCRG1. This finding offers new avenues for further exploration of mechanisms underlying OA.