AbstractBackgroundAD is not caused by a single factor but by a variety of factors, of which factors are considered to be important. At present, the factors affecting LATE are largely unknown; in particular, the factors that distinguish AD from LATE are yet to be identified.MethodWe leveraged an innovative integrated feature selection‐based algorithm of preprocessing, feature selection, and validation, abbreviated as PFV (Figure 1), to identify important environmental risk factors differentiating subjects with LATE and/or AD from Control with healthy cognition on significantly imbalanced data. PFV was applied on two existing large‐scale datasets, i.e., a discovery dataset (ROSMAP) and a validation dataset (NACC), which comprised subjects with LATE and AD, and Control. Independent validation across datasets was performed in NACC by using chi‐square test and logistic regression analysis.ResultWe retrospectively analyzed the data of 508 subjects in ROSMAP for discovery and 9,256 subjects in NACC for independent validation. In ROSMAP, the sets of risk factors that we identified were usually small. For example, for LATE vs. AD scenario, compared to using all variates, 2 risk factors were identified for the whole cohort with an improvement in AUROC of about 7%, 1 risk factor was identified for Male subpopulation with an improvement of about 26%, 3 risk factors were identified for Female subpopulation with an improvement of about 7%, 2 risk factors were identified for White subpopulation with an improvement of about 6%, and 3 risk factors were identified for >85 subpopulation with an improvement of about 9% (Figure 2 and Table 2). And our results suggest that alcohol intake and living activities are significant risk factors for LATE; for instance, in LATE vs. Control scenario, lifetime daily alcohol intake is the common risk factor for the whole cohort (p‐value: 3.14e‐03), Male subpopulation (p‐value: 4.00e‐02), White subpopulation (p‐value: 1.14e‐03), and >85 subpopulation (p‐value: 4.11e‐03). In NACC, statistical analyses validate the role of alcohol consumption on AD and LATE.ConclusionWe found that alcohol intake affects AD than LATE, which was first identified on the discovery dataset ROSMAP and then further validated on the across dataset NACC.