BackgroundThere have been many efforts to expand existing data collection initiatives to include COVID-19 related data. One program that expanded is UK Biobank, a large-scale research and biomedical data collection resource that added several COVID-19 related data fields including questionnaires (exposures and symptoms), viral testing, and serological data. This study aimed to analyze this COVID-19 data to understand how COVID-19 data was collected and how it can be used to attribute COVID-19 and analyze differences in cohorts and time periods.MethodsA cohort of COVID-19 infected individuals was defined from the UK Biobank population using viral testing, diagnosis, and self-reported data. Changes over time, from March 2020 to October 2021, in total case counts and changes in case counts by identification source (diagnosis from EHR, measurement from viral testing and self-reported from questionnaire) were also analyzed. For the questionnaires, an analysis of the structure and dynamics of the questionnaires was done which included the amount and type of questions asked, how often and how many individuals answered the questions and what responses were given. In addition, the amount of individuals who provided responses regarding different time segments covered by the questionnaire was calculated along with how often responses changed. The analysis included changes in population level responses over time. The analyses were repeated for COVID and non-COVID individuals and compared responses.ResultsThere were 62 042 distinct participants who had COVID-19, with 49 120 identified through diagnosis, 30 553 identified through viral testing and 934 identified through self-reporting, with many identified in multiple methods. This included vast changes in overall cases and distribution of case data source over time. 6 899 of 9 952 participants completing the exposure questionnaire responded regarding every time period covered by the questionnaire including large changes in response over time. The most common change came for employment situation, which was changed by 74.78% of individuals from the first to last time of asking. On a population level, there were changes as face mask usage increased each successive time period. There were decreases in nearly every COVID-19 symptom from the first to the second questionnaire. When comparing COVID to non-COVID participants, COVID participants were more commonly keyworkers (COVID: 33.76%, non-COVID: 15.00%) and more often lived with young people attending school (61.70%, 45.32%).ConclusionTo develop a robust cohort of COVID-19 participants from the UK Biobank population, multiple types of data were needed. The differences based on time and exposures show the important of comprehensive data capture and the utility of COVID-19 related questionnaire data.