In recent years, reading comprehension has gradually become a proficiency indicator of interest in lexical and grammar. As sentences are the basic units of discourse structure, sentence difficulty is often applied to the study of text difficulty. Although there have been a number of studies on sentence difficulty, the lack of consistency in the indicators chosen or the discussion of specific grammatical issues have limited the research on sentence difficulty. Therefore, this study adopts a corpus-based approach, using a corpus as an objective and scientific data source. The study utilizes the Digital Platform for Chinese Grammar and the 8000 Chinese Words as important reference sources. Additionally, the CRIE 3.0 is employed to validate the texts and establish sentence difficulty indicators. However, due to the incomplete development of certain indicators in the "Chinese Grammar Digital Platform", the study refers to the Chinese Proficiency Grading Standards for International Chinese Language Education and Hanyu Shuiping Kaoshi to establish comprehensive sentence structure and sentence component difficulty indicators. Subsequently, the established difficulty indicators are validated by conducting comparative analyses using corpora as the basis. Native speaker corpora are used as benchmarks, while Mandarin learner corpora are used for comparison, and then validate objectively through the machine learning model. These validation aims to examine the validity and reliability of the selected indicators and establish a calculation method involving " level of grammar * point distribution ratio of grammar " to determine the difficulty indicators for Chinese sentences, Additionally, expert reliability is accessed to ensure the credibility of indicators.