BackgroundThe majority of rare diseases are complex diseases caused by a combination of multiple morbigenous factors. However, uncovering the complex etiology and pathogenesis of rare diseases is difficult due to limited clinical resources and conventional statistical methods. This study aims to investigate the interrelationship and the effectiveness of potential factors of pediatric cataract, for the exploration of data mining strategy in the scenarios of rare diseases.MethodsWe established a pilot rare disease specialized care center to systematically record all information and the entire treatment process of pediatric cataract patients. These clinical records contain the medical history, multiple structural indices, and comprehensive functional metrics. A two-layer structural equation model network was applied, and eight potential factors were filtered and included in the final modeling.ResultsFour risk factors (area, density, location, and abnormal pregnancy experience) and four beneficial factors (axis length, uncorrected visual acuity, intraocular pressure, and age at diagnosis) were identified. Quantifiable results suggested that abnormal pregnancy history may be the principle risk factor among medical history for pediatric cataracts. Moreover, axis length, density, uncorrected visual acuity and age at diagnosis served as the dominant factors and should be emphasized in regular clinical practice.ConclusionsThis study proposes a generalized evidence-based pattern for rare and complex disease data mining, provides new insights and clinical implications on pediatric cataract, and promotes rare-disease research and prevention to benefit patients.