679 Background: Pancreatic Adenocarcinoma (PDAC) is often diagnosed at an advanced stage. We sought to develop a model for early PDAC prediction in the general population, using electronic health records (EHRs) and machine learning. Methods: We used three EHR datasets from Beth-Israel Deaconess Medical Center (BIDMC) and Partners Healthcare (PHC): 1. “BIDMC-Development-Data” (BIDMC-DD) for model development, using a feed-forward neural network (NN) and L2-regularized logistic regression,randomly split (80:20) into training and test groups. We tuned hyperparameters using cross-validation in training, and report performance on the test split. 2. “BIDMC-Large-Data” (BIDMC-LD) to re-fit and calibrate models. 3. “PHC-Data” for external validation. We evaluate using Area Under the Receiver Operating Characteristic Curve (AUC) and compute 95% CI using empirical bootstrap over test data. PDAC patients were selected using ICD9/-10 codes and validated with tumor registries. In contrast to prior work, we did not predefine feature sets based on known clinical correlates and instead employed data-driven feature selection, specifically importance-based feature pruning, regularization, and manual validation, to identify diagnostic-based features. Results: BIDMC-DD included demographics, diagnoses, labs and medications for 1018 patients (cases = 509; age-sex paired controls). BIDMC-LD included diagnoses for 547,917 patients (cases = 509), and PHC included diagnoses for 160,593 patients (cases = 408). We compared our approach to adapted and re-fitted published baselines. With a 365-day lead-time, NN obtained a BIDMC-DD test AUC of 0.84 (CI 0.79 - 0.90) versus the previous best baseline AUC of 0.70 (CI 0.62 - 0.78). We also validated using BIDMC-DD’s test cancer patients and BIDMC LD controls. The AUC was 0.71 (CI 0.67 - 0.76) at the 365-day cutoff. NN’s external validation AUC on PHC-Data was 0.71 (CI 0.63 - 0.79), outperforming an existing model’s AUC of 0.61 (CI 0.52 - 0.70) (Baecker et al, 2019). Conclusions: Models based on data-driven feature selection outperform models that use predefined sets of known clinical correlates and can help in early prediction of PDAC development.