We evaluated the generalizability and accuracy of the IBM® MarketScan® Health Risk Assessment (HRA) data to assess its suitability as supplement to linked claims data. We identified adult private insurance enrollees in the IBM® MarketScan® Commercial Claims & Encounters (CC&E) and HRA databases between 2012 and 2017. In the claims data, for each enrollee, we sampled the first calendar year with continuous enrollment indicating full capture of claims data and extracted linked HRA survey data if available. We compared HRA participants and non-participants considering demographics, prevalences of chronic conditions, and healthcare utilization. Including the subsample with HRA data only, we estimated the negative predictive value (NPV) of obesity and smoking reported in the HRA against diagnosis code in the claims data. Between 2012 and 2017, 2 693 444 and 31 450 000 of HRA and non-HRA participants were included in the study, respectively. Chronic diseases were similarly distributed between the two populations, with hypertension and hyperlipidemia representing the highest prevalence difference (1.4%). The two samples showed similar healthcare utilization. The proportion of false-negatives for obesity and smoking information when relying on the HRA data compared to patients with positive diagnosis based on claims data was low (<1%). Prevalence estimates of both variables were similar to national estimates. Our findings suggest that the overall HRA population may represent the overall claims population and HRA provides certain data elements with satisfactory accuracy.