Purpose The purpose of this study was (1) to analyze judges’ evaluation on rhythmic gymnastics performance by applying generalizability theory and (2) to suggest recommendations to improve judges' rating. Methods Data were 34 players’ scores from Senior Part at 29th KGA President’s Cup National Rhythmic Gymnastics Championship in Korea. Difficulty and execution scores in ball, clubs, hoop and ribbon event were analyzed. Analysis models containing components of area and reputation rank were designed and multivariate generalizability theory were used for analysis. Results The G-study results showed (1) that the error source about players has more significant impact to evaluation than other error sources in analysis model containing components of only player and judge, (2) that the error source about players has more significant impact to evaluation than other error sources in analysis model adding components of area, but the error source about area has more significant impact to evaluation of clubs event than other error sources, (3) that the error source about players has more significant impact to evaluation than other error sources in analysis model adding components of reputation rank, but the error source about reputation rank has more significant impact to evaluation of hoop event than other error sources in analysis model adding components of area. The D-study results showed generalizability coefficient was stable in analysis model without components of area and reputation rank, but generalizability coefficient in analysis model containing components of area and reputation rank not stable in some event. Conclusion Recommendations for improving judging were discussed.