e13559 Background: Recent advances in targeted therapies have increased the survival and quality of life for non-small cell lung cancer (NSCLC) patients. Understanding clinical characteristics and predicting HER2 status may facilitate efficient use of next-generation sequencing (NGS) testing and promote precision medicine/targeted therapies. Methods: NSCLC patients ≤65 years old, diagnosed between 01/2011 and 06/2021 who have NGS test results were identified in from two National Cancer Institute designated cancer centers in the Northeastern region of the United States. Demographics, behavioral, and clinical characteristics were extracted from electronic health record (EHR) data and EHR-linked cancer registry. Prevalence prediction modelling of HER2/ERBB2 mutation was performed using machine learning (ML) methods including logistic regression (LR), random forest (RF), naïve Bayes (NB), extreme gradient boosting (XGBoost) and multi-layer perceptron (MLP). 10- fold cross validation and k-nearest neighbor method were used to impute missing values. Under-sampling, oversampling and feature selection methods were tried. Results: Among 761 NSCLC patients with NGS reports available, 353 (46%) were ≤65 years old. Among 353 (mean age 57, 58% female, 88% non-Hispanic, 69% white, 57% had metastasis, 58% prescribed chemotherapy) 25 (7%) had a HER2 mutation. Based on univariate significance tests with LR on the complete data: cerebrovascular disease, BMI between 27.5-29.9, KRAS, EGFR and MET were significant with odds ratios (ORs) ranging from 0.2-3.9. In our prediction models, BMI, cerebrovascular disease, weight loss, hepatic metastases, diabetes, congestive heart failure, and KRAS, EGFR, MET and RET alteration status were significant predictors in multivariate analyses, suggesting the potential usefulness of these features in predicting HER2 mutation among patients with NSCLC. Area under the ROC curve (AUC) varied across different models from 70.9% (RF), 70% (XGB and MLP), 69.5% (LR) to 68.2% (Naïve Bayes). The best performing model with RF yielding an AUC ~71% can provide relative high specificity and acceptable sensitivity depending on the cut off. Conclusions: Our prediction models yielded AUC around 70% with the best performing model’s AUC being 71% in a subgroup aged ≤65. After further validation, our prediction model may be used to identify patients appropriate for repeat NGS testing who did not have HER2 testing in first molecular profiling. Future work is needed to predict status in repeat testing with previously test negative patients who relapse or have progression of disease. Random forest (RF) Model performance across different cutoff for the predicted probability. [Table: see text]