ObjectivesRheumatoid arthritis (RA) is a complex disease with a challenging diagnosis, especially in seronegative patients. The aim of this study is to investigate whether the methylation sites associated with the overall immune response in RA can assist in clinical diagnosis, using targeted methylation sequencing technology on peripheral venous blood samples. MethodsThe study enrolled 241 RA patients, 30 osteoarthritis patients (OA), and 30 healthy volunteers control (HC). Fifty significant cytosine guanine (CG) sites between undifferentiated arthritis and RA were selected and analyzed using targeted DNA methylation sequencing. Logistic regression models were used to establish diagnostic models for different clinical features of RA, and six machine learning methods (logit model, random forest, support vector machine, adaboost, naive bayes, and learning vector quantization) were used to construct clinical diagnostic models for different subtypes of RA. Least absolute shrinkage and selection operator regression and detrended correspondence analysis were utilized to screen for important CGs. Spearman correlation was used to calculate the correlation coefficient. ResultsThe study identified 16 important CG sites, including tumor necrosis factort receptor associated factor 5 (TRAF5) (chr1:211500151), mothers against decapentaplegic homolog 3 (SMAD3) (chr15:67357339), tumor endothelial marker 1 (CD248) (chr11:66083766), lysosomal trafficking regulator (LYST) (chr1:235998714), PR domain zinc finger protein 16 (PRDM16) (chr1:3307069), A-kinase anchoring protein 10 (AKAP10) (chr17:19850460), G protein subunit gamma 7 (GNG7) (chr19:2546620), yes1 associated transcriptional regulator (YAP1) (chr11:101980632), PRDM16 (chr1:3163969), histone deacetylase complex subunit sin3a (SIN3A) (chr15:75747445), prenylated rab acceptor protein 2 (ARL6IP5) (chr3:69134502), mitogen-activated protein kinase kinase kinase 4 (MAP3K4) (chr6:161412392), wnt family member 7A (WNT7A) (chr3:13895991), inhibin subunit beta B (INHBB) (chr2:121107018), deoxyribonucleic acid replication helicase/nuclease 2 (DNA2) (chr10:70231628) and chromosome 14 open reading frame 180 (C14orf180) (chr14:105055171). Seven CG sites showed abnormal changes between the three groups (P < 0.05), and 16 CG sites were significantly correlated with common clinical indicators (P < 0.05). Diagnostic models constructed using different CG sites had an area under the receiver operating characteristic curve (AUC) range of 0.64–0.78 for high-level clinical indicators of high clinical value, with specificity ranging from 0.42 to 0.77 and sensitivity ranging from 0.57 to 0.88. The AUC range for low-level clinical indicators of high clinical value was 0.63–0.72, with specificity ranging from 0.48 to 0.74 and sensitivity ranging from 0.72 to 0.88. Diagnostic models constructed using different CG sites showed good overall diagnostic accuracy for the four subtypes of RA, with an accuracy range of 0.61–0.96, a balanced accuracy range of 0.46–0.94, and an AUC range of 0.46–0.94. ConclusionsThis study identified potential clinical diagnostic biomarkers for RA and provided novel insights into the diagnosis and subtyping of RA. The use of targeted deoxyribonucleic acid (DNA) methylation sequencing and machine learning methods for establishing diagnostic models for different clinical features and subtypes of RA is innovative and can improve the accuracy and efficiency of RA diagnosis.