Abstract In the context of the data-driven era, it is essential to explore practical ways to cultivate the core literacy of sports professionals in colleges and universities. This study aims to analyze the core literacy of physical education majors in depth by using data mining technology to seek a more scientific and systematic cultivation strategy. The study used the decision tree CART algorithm and the improved Apriori algorithm to analyze the physical fitness data of the students in the School of Physical Education of S University. The CART algorithm clustered the biological fitness data of male and female students, and it was found that male students’ primary physical fitness deficiencies were concentrated in the upper body strength and standing long jump events. In contrast, female students showed deficiencies in endurance and lower body explosive strength. The improved Apriori algorithm reveals the association rules between different physical fitness items, for example, there is a strong association between boys’ 50-meter running performance and pull-up performance. There are apparent differences in the influencing factors of physical fitness between male and female students, which need to be targeted to design training programs. It was found through association rule mining that specific physical testing programs significantly affect students’ physical fitness quality. This study provides a new path for cultivating core quality of physical education professionals based on data mining, which offers scientific basis and practical guidance for physical education in colleges and universities.