ObjectiveTo assess the use of medical claims records for surveillance andepidemiological inference through a case study that examines howecological and social determinants and measurement error contributeto spatial heterogeneity in reports of influenza-like illness across theUnited States.IntroductionTraditional infectious disease epidemiology is built on thefoundation of high quality and high accuracy data on disease andbehavior. Digital infectious disease epidemiology, on the other hand,uses existing digital traces, re-purposing them to identify patterns inhealth-related processes. Medical claims are an emerging digital datasource in surveillance; they capture patient-level data across an entirepopulation of healthcare seekers, and have the benefits of medicalaccuracy through physician diagnoses, and fine spatial and temporalresolution in near real-time.Our work harnesses the large volume and high specificity ofdiagnosis codes in medical claims to improve our understanding ofthe mechanisms driving spatial variation in reported influenza activityeach year. The mechanisms hypothesized to drive these patterns areas varied as: environmental factors affecting transmission or virussurvival, travel flows between different populations, population agestructure, and socioeconomic factors linked to healthcare access andquality of life. Beyond process mechanisms, the nature of surveillancedata collection may affect our interpretation of spatial epidemiologicalpatterns [1], particularly since influenza is a non-reportable diseasewith non-specific symptoms ranging from asymptomatic to severe.Considering the ways in which medical claims are generated, biasesmay arise from healthcare-seeking behavior, insurance coverage, andmedical claims database coverage in study populations.MethodsUsing aggregated U.S. medical claims for influenza-like illness(ILI) from the 2001-2002 through 2008-2009 flu seasons [2],we developed a Bayesian hierarchical modeling framework toestimate the importance of both ecological and social determinantsand measurement-related factors on observed county-level variationof influenza disease burden across the United States. Integrated NestedLaplace Approximation (INLA) techniques for Bayesian inferencewere used to render our questions computationally tractable due tothe high spatial resolution of our data (Figure 1) and the multiplicityof models in our analysis [3]. Linking data from a variety of publiclyavailable sources, we determined the strength, directionality, andconsistency of these factors over multiple flu seasons.ResultsWe found that measurement-related factors – healthcare-seekingbehavior, insurance coverage, and medical claims database coverage– were strong predictors of greater ILI intensity across seasons.Secondarily, poverty and specific humidity were negatively associatedwith ILI intensity for several seasons. Finally, by incorporatingmechanistic and measurement factors into our model, our modelpredictions present an improved map of influenza-like illness in theUnited States for the flu seasons in our study period.ConclusionsWe present a flexible modeling approach that applies to differentmedical claims diagnosis codes and disease surveillance data anddemonstrates the utility of Bayesian hierarchical models for large-scale ecological analyses. Our results increase our knowledge of thespatial distribution of influenza and the underlying processes thatdrive these patterns, promote finer spatial targeting for differenttypes of interventions, and enable the interpolation of burden in areasdifficult to surveil through traditional public health. Moreover, theyhighlight the relative contributions of surveillance data collectionand ecological processes to spatial variation in disease, and highlightthe importance of considering measurement biases when usingsurveillance data for epidemiological inference.