Editor's evaluation: Disentangling the rhythms of human activity in the built environment for airborne transmission risk: An analysis of large-scale mobility data

Niel Hens

doi:10.7554/elife.80466.sa0

Abstract

Article Figures and data Abstract Editor's evaluation Introduction Methods Results Discussion Data availability References Decision letter Author response Article and author information Metrics Abstract Background: Since the outset of the COVID-19 pandemic, substantial public attention has focused on the role of seasonality in impacting transmission. Misconceptions have relied on seasonal mediation of respiratory diseases driven solely by environmental variables. However, seasonality is expected to be driven by host social behavior, particularly in highly susceptible populations. A key gap in understanding the role of social behavior in respiratory disease seasonality is our incomplete understanding of the seasonality of indoor human activity. Methods: We leverage a novel data stream on human mobility to characterize activity in indoor versus outdoor environments in the United States. We use an observational mobile app-based location dataset encompassing over 5 million locations nationally. We classify locations as primarily indoor (e.g. stores, offices) or outdoor (e.g. playgrounds, farmers markets), disentangling location-specific visits into indoor and outdoor, to arrive at a fine-scale measure of indoor to outdoor human activity across time and space. Results: We find the proportion of indoor to outdoor activity during a baseline year is seasonal, peaking in winter months. The measure displays a latitudinal gradient with stronger seasonality at northern latitudes and an additional summer peak in southern latitudes. We statistically fit this baseline indoor-outdoor activity measure to inform the incorporation of this complex empirical pattern into infectious disease dynamic models. However, we find that the disruption of the COVID-19 pandemic caused these patterns to shift significantly from baseline and the empirical patterns are necessary to predict spatiotemporal heterogeneity in disease dynamics. Conclusions: Our work empirically characterizes, for the first time, the seasonality of human social behavior at a large scale with a high spatiotemporal resolutio and provides a parsimonious parameterization of seasonal behavior that can be included in infectious disease dynamics models. We provide critical evidence and methods necessary to inform the public health of seasonal and pandemic respiratory pathogens and improve our understanding of the relationship between the physical environment and infection risk in the context of global change. Funding: Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R01GM123007. Editor's evaluation This is a valuable study characterizing seasonal deviations in indoor activity at the county level in the United States with relevance to respiratory disease transmission. The strength of evidence is solid. This study and its results are of potential interest to those people constructing more evidence-based infectious disease transmission models. https://doi.org/10.7554/eLife.80466.sa0 Decision letter Reviews on Sciety eLife's review process Introduction The seasonality of infectious diseases is a widespread and familiar phenomenon. Although a number of potential mechanisms driving seasonality in directly transmitted infectious diseases have been proposed, the causal process behind seasonality is still largely an open question (Martinez, 2018; Altizer et al., 2006; Grassly and Fraser, 2006). In the case of the influenza virus, seasonal changes in humidity have been identified as a potential mechanism, with drier winter months enhancing transmission (Shaman and Kohn, 2009; Shaman et al., 2010; Dalziel et al., 2018); similar patterns have been observed for respiratory syncytial virus and hand foot and mouth disease (Baker et al., 2019; Onozuka and Hashizume, 2011). However, humidity is but one of many mechanisms contributing to seasonality in infectious disease transmission. Seasonal changes in temperature, human mixing patterns, and the immune landscape, among other factors, are thought to contribute to transmission dynamics (Metcalf et al., 2009; Mossong et al., 2008; Kronfeld-Schor et al., 2021; Bakker et al., 2021; Altizer et al., 2006). The relative importance of these disparate mechanisms varies across directly-transmitted pathogens and is still largely unexplained (Martinez, 2018; Grassly and Fraser, 2006). The influence of seasonal host behavior on respiratory disease seasonality remains particularly understudied (Fisman, 2012; Kronfeld-Schor et al., 2021) except for a few notable examples (Bharti et al., 2011; Few et al., 2013; Kummer et al., 2022). For respiratory pathogens spread via the aerosol transmission route, in particular, seasonality may be mediated by multiple behaviorally-driven mechanisms. Aerosol transmission, a significant mode of transmission for a number of respiratory pathogens including tuberculosis, measles, and influenza (Tellier et al., 2019), has become increasingly acknowledged during the COVID-19 pandemic (Greenhalgh et al., 2021; Wang et al., 2021; Jayaweera et al., 2020; Klompas et al., 2020; Morawska and Milton, 2020). The role of aerosols in respiratory disease transmission allows for transmission outside of the traditional 6 ft. radius and 5 min duration for the droplet mode and implicates human mixing in indoor locations with poor ventilation as being a high risk for transmission, regardless of the intensity of the social contact. While more is known about the spatiotemporal variation in environmental factors such as temperature and humidity in the indoor environment (e.g. Nguyen and Dockery, 2016) and about the impact these factors have on airborne pathogen transmission (e.g. Robey and Fierce, 2022; Yang and Marr, 2011), limited information is available on rates of human indoor activity and how this varies geographically and seasonally. In the United States, most studies quantifying indoor and outdoor time are conducted in the context of air pollutants, suffer from small study sizes, lack spatiotemporal resolution, and are outdated. The most cited estimates originate from the 1980s-90s and estimate that Americans spend upwards of 90% of their time indoors (Ott, 1988); more recent data agree with these estimates (Klepeis et al., 2001; Spalt et al., 2016). While it is well understood that seasonal differences and latitude likely affect time spent indoors, little is known of the spatiotemporal variation in indoor activity beyond this one monolithic estimate, vastly limiting our ability to comprehensively characterize the seasonality of airborne disease exposure risk. Because our understanding of the drivers of seasonality for respiratory diseases has been limited, the modeling of seasonally-varying infectious disease dynamics has been traditionally done using environmental data-driven or phenomenological approaches. Environmental data-driven approaches incorporate seasonality into epidemiological models through environmental correlates of seasonality, such as solar exposure or outdoor temperature (Bakker et al., 2021; Baker et al., 2019; Coletti et al., 2018). This approach to seasonal dynamics controls for interseasonal variation in transmission dynamics and measures the strength of correlations between proposed metrics and seasonal variation in force of infection – although the observed relationship is rarely causally relevant for respiratory disease transmission. In contrast, phenomenological models such as seasonal forcing approaches modulate transmissibility over time without specifying a particular mechanism for this modulation (Keeling et al., 2001; Altizer et al., 2006). By applying well-understood functions (such as sine functions), seasonal forcing allows for flexible specification and quantification of dynamics, such as periodicity or oscillation damping, and indirectly captures seasonal variation in nonenvironmental factors such as school mixing. A significant remaining gap in seasonal infectious disease modeling is thus the ability to empirically incorporate spatiotemporal variation in behavioral mechanisms driving seasonality of disease exposure and transmission. Thus, despite the role of the indoor built environment in exposure to the airborne transmission route, seasonal variation in indoor human mixing has not yet been systematically characterized nor integrated into mathematical models of seasonal respiratory pathogens. To address this gap, we construct a novel metric quantifying the relative propensity for human mixing to be indoors at a fine spatiotemporal scale across the United States. We derive this metric using anonymized mobile GPS panel data of visits of over 45 million mobile devices to approximately 5 million public locations across the United States. We find a systematic latitudinal gradient, with indoor activity patterns in the northern and southern United States following distinct temporal trends at baseline. However, we find that the COVID-19 pandemic disrupted this structure. Lastly, we fit simple parametric models to incorporate these seasonal activity dynamics into models of infectious disease transmission when indoor activity is expected to be at baseline. Our work provides the evidence and methods necessary to inform the epidemiology of seasonal and pandemic respiratory pathogens and improve our understanding of the relationship between the physical environment and infection risk in light of global change. Methods Data source We use the SafeGraph Weekly Patterns data, which provides foot traffic at public locations (‘points of interest’, hereafter referred to as POIs) across the United States based on the usage of mobile apps with GPS (Safegraph, 2021a). The data are from 2018–2020, and 4.6 million POIs are sampled in all years of our study. The data is anonymized by applying noise, omitting data associated with a single mobile device, and is provided at the weekly temporal scale. Data are sampled from over 45 million smartphone devices (of approximately 275–290 million smartphone devices in the United States during 2018–2021 Statista Digital Market Outlook, 2022), and does not include devices that are out of service, powered off, or ones that opt out of location services on their devices. This is secondary data analysis, so no informed consent or consent to publish was necessary. Ethical review for this study (STUDY00003041) was sought from the Institutional Review Board at Georgetown University and was approved on October 14, 2020. Defining indoor activity seasonality Safegraph POIs are locations where consumers can spend money and/or time and include schools, hospitals, parks, grocery stores, restaurants, etc., but do not include home locations. (In Figure 1—figure supplement 1, we show that time at home does not display significant seasonal variation). Each POI is assigned a six-digit North American Industry Classification System (NAICS) code in the SafeGraph Core Places dataset to classify each location into a business category. We classify each six-digit NAICS code (363 unique codes in total) as primarily indoor (e.g. schools, hospitals, grocery stores) or primarily outdoor (e.g. parks, cemeteries, zoos). We classify some locations as unclear if the location is a potentially mixed indoor and outdoor setting (e.g. gas stations with convenience stores, automobile dealerships). Approximately 90% of POIs were classified as indoors, 6.5% were classified as outdoors, and 3.5% were classified as unclear. In Figure 1—figure supplement 2, we illustrate the robustness of our metric to the classification of unclear locations. We define σ~i⁢t, Equation 1, as the propensity for visits to be to indoor locations relative to outdoor locations. We aggregated raw visit counts, defined when a device is present at a non-home POI for longer than one minute, to all indoor POIs and all outdoor POIs in a given week (t) at the US county level (i). Visit counts are normalized by the maximum visit counts for indoor or outdoor locations in each county during the year 2019 (In Figure 1—figure supplement 3, we show that the maximum visit count is comparable in 2018 and 2019). (1) σ~it=Nitindoor/maxt{Nitindoor}Nitoutdoor/maxt{Nitoutdoor} This metric is then mean-centered to arrive at a relative measure of indoor activity seasonality, σi⁢t, which is comparable across all counties: (2) σit=σ~itμσ~ We note that μσ~ is not spatially structured (see Figure 1—figure supplement 4). As a data cleaning step, we use spatial imputation for any county-weeks where sample sizes are small. For location-weeks in which the total visit count is less than 100, we impute the indoor activity seasonality using an average of σ in the neighboring locations (where neighbors are defined based on shared county borders). This affects 0.6% of all county-weeks and a total of 79 (out of 3143) counties. Time series clustering analysis To characterize groups of US counties with similar indoor activity dynamics, we use a complex networks-based time series clustering approach. We first calculate the pairwise similarity between z-normalized indoor activity time series for each pair of counties, i and j using the Pearson correlation coefficient (ρi⁢j). For pairs of locations where ρi⁢j is in the top 10% of all correlations, we represent the pairwise time series similarities as a weighted network where nodes are US counties and edges represent strong time series similarity (In Figure 2—figure supplement 1, we show the robustness of our clustering results to this choice of correlation threshold). We then cluster the time series similarity network using community structure detection. This method effectively clusters nodes (counties) into groups of nodes that are more connected within than between. The resulting clustering thus represents a regionalization of the United States in which regions consist of counties that have more similar indoor activity dynamics to each other than to other regions. One benefit of the network-based community detection approach over other clustering methods is that community detection does not require user specification of the number of clusters (regions, in this case); instead, the number of clusters emerges organically from the data connectivity (Aggarwal and Reddy, 2013). For community detection, we use the Louvain method (Blondel et al., 2008), a multiscale method in which modularity is first optimized using a greedy local algorithm, on the similarity network with edge weights (i.e. time series correlations) using a igraph implementation in Python (Louvain-igraph, 2018). We performed a robustness assessment of the community structure using a set of 25 ‘bootstrap networks,’ Bi. For each bootstrap network, the edge weight (i.e. the time series correlation) for each edge of the network was perturbed by ϵ⁢N⁢(0,0.05). The community structure algorithm was performed on each bootstrap network. A consensus value was then calculated as the sum of the normalized mutual information between the community structure partition of the bootstrap network Bi and all other bootstrap networks. The partition with the largest consensus value was defined as the robust community structure partition. Given some known limitations to the time series correlation network-based approach to clustering (Hoffmann et al., 2020), we validated our network-based clustering results with another common clustering method. In particular, we used hierarchical clustering with Ward linkage and Euclidean distance on z-normalized indoor activity time series, implemented using scipy in Python. (We note that Euclidean distance is equivalent to Pearson’s correlation on normalized time series Berthold and Höppner, 2016). The results of this comparison are summarized in Figure 2—figure supplement 5. Disruptions to indoor activity due to pandemic response We investigate the COVID-19 pandemic’s impact on indoor activity seasonality by comparing pre-pandemic mobility patterns in 2018 and 2019 with mobility patterns during the COVID-19 pandemic in 2020. We compared the proportion of indoor visits at the county level, σi⁢t, across 2018, 2019, and 2020 to examine changes in indoor activity seasonality during the COVID-19 pandemic. We also examined total activity, aggregating visits to all indoor, outdoor, and unclear POIs by week and mean-centering them for each US county during the COVID-19 pandemic in 2020. Incorporating indoor activity into infectious disease models We seek to illustrate the impact of incorporating seasonality into an infectious disease model using a phenomenological model versus empirical data. To achieve this, we parameterize a simple compartmental disease model with a seasonality term, using either our empirically-derived indoor activity seasonality metric or an analytical phenomenological model of seasonality fit to this metric. Phenomenological model of seasonality We first fit our empirically-derived indoor activity seasonality metric using a time-varying non-linear model. We specify the time-varying effect as a sinusoidal function as is commonly done to incorporate seasonality into infectious disease models phenomenologically. The indoor activity seasonality, σi⁢t for cluster i at week t is specified as: σi⁢t=1+αi⁢sin⁡(ωi⁢t+ϕi), where αi is the sine wave amplitude, ωi is the frequency, and ϕi is the phase. We fit a model for locations in the northern cluster separately from those in the southern cluster, as identified above. We fit the parameters for this model using the nlme, a standard package in R for fitting Gaussian nonlinear models. Disease model We model infectious disease dynamics through a simple SIR model of disease spread: dSdt=−β0β(t)SI dIdt=β0β(t)SI−γI d⁢Rd⁢t=γ⁢I We incorporate alternative seasonality terms to consider the impact of heterogeneity in indoor seasonality on disease dynamics. For the northern and southern clusters separately, we define modeled seasonality as β⁢(t)=1+α⁢sin⁡(ω⁢t+ϕ), with the fitted parameters for each cluster (Figure 4—figure supplement 1 and Figure 4—figure supplement 2). We also consider two exemplar locations for empirical estimates of seasonality, where β⁢(t)=σt after rolling window smoothing: Cook County for an example county from the northern cluster, and Maricopa County for an example location from the southern cluster. We also compare against a null expectation where β⁢(t)=1 (All seasonality functions are illustrated in Figure 4—figure supplement 3). We assume that β0=0.0025 and γ=2 (on a weekly time scale). Results Based on anonymized location data from mobile devices, we construct a novel metric that measures the relative propensity for human activity to be indoors at a fine geographic (US county) and temporal (weekly) scale. Activity is measured as the number of visits to unique physical, public (non-residential) locations across the United States. Locations are classified as indoors if they are enclosed environments (i.e. buildings and transportation services). We characterize the systematic spatiotemporal structure in this metric of indoor activity seasonality with a time series clustering analysis. We also characterize the shift that occurred in the baseline patterns of indoor activity seasonality during the COVID-19 pandemic. We note that this seasonal variation in the propensity of human activity to be indoors differs from the variation in overall rates of contact or mobility, which does not appear to be highly seasonal (Figure 1—figure supplement 1, Klein et al., 2022). Lastly, we fit non-linear models to the indoor activity metric at baseline, comparing the ability of a simple model to capture seasonal variation in transmission risk. Quantifying empirical dynamics in an indoor activity The indoor activity seasonality metric, σ, captures the relative frequency of visits to indoor versus outdoor locations within an area. The components of σ capture the degree to which indoor and outdoor locations are occupied; when σ=1, a given county is at its county-specific average propensity (over time) for indoor activity relative to outdoor. When σ<1, activity within the county is more frequently outdoor and less frequently indoor than average, while σ>1 indicates that activity is more frequently indoor and less frequently outdoor than average. Thus, a σ of 1.2 indicates that the county’s activity is 20% more indoor than average, and a σ of 0.80 indicates that the county’s activity is 20% less indoor than average (additional details in methods). Through this metric, we measure the relative propensity for human activity to be indoors for every community (i.e. US county) across time (at a weekly timescale), finding significant heterogeneity between counties (Figure 1A). The representative examples of Cook County, Illinois (home of the city of Chicago in the northern US) and Maricopa County, Arizona (home of the city of Phoenix in the southwestern US) highlight systematic spatial and temporal heterogeneity in indoor activity dynamics. In Cook County, indoor activity varies over time, at its peak in the winter, with the relative odds of an indoor visit well above average. During the summer, σ in Cook County reaches its trough, with activity systematically more outdoors on average. On the other hand, the variation of σ across time in Maricopa County is characterized by a smaller winter peak in indoor activity, and an additional peak in the summer (i.e. July and August); this peak occurs concurrently with the trough in Cook County. Unlike in Cook County, σ in Maricopa County is lowest in the spring and fall. These representative counties illustrate the systematic within-county variation in indoor activity over time, as well as the between-county variation in temporal trends as represented in Figure 1B for all US communities. Figure 1 with 4 supplements see all Download asset Open asset Spatio-temporal heterogeneity in indoor activity seasonality. (A) Case studies to highlight varying trends in indoor activity seasonality during 2018 and 2019: King County and Suffolk County (in the northern United States) have high indoor activity in the winter months and a trough in indoor activity in the summer months. Miami-Dade and Maricopa County (in the southern United States) see moderate indoor activity in the winter and may have an additional peak in indoor activity during the summer. We apply a rolling window mean for visualization purposes. (B) A heatmap of the indoor activity seasonality metric for all US counties by week for 2018 and 2019. Counties are ordered by latitude. We see significant spatiotemporal heterogeneity with distinct trends in the summer versus winter seasons. To identify systematic geographic structure, we cluster the heterogeneous time series of county-level, weekly indoor activity. We find three geographic clusters corresponding to groups of locations that experience similar indoor activity dynamics (Figure 2). These clusters primarily split the country into two clusters: a northern cluster and a southern cluster. Among the communities in the northern cluster, activity is more commonly outdoor over the summer months, trending toward indoor during fall, with a peak in the winter months, as observed in Cook County. Comparatively, the southern cluster has a larger winter peak (i.e. between December and February) and a smaller summer peak (i.e. between July and August); most summer peaks are less extreme than that of Maricopa County (shown). We hypothesize that these two clusters are consistent with climate zones. While there is a moderate association between indoor activity seasonality and environmental variables such as temperature and humidity (Figure 2—figure supplement 2), we expect that the northern and southern indoor activity clusters will be more consistent with climate zones defined for the construction of the indoor built environment and find that there is indeed substantial consistency between the two (Figure 2—figure supplement 3). The third cluster differs substantially: it is geographically discontiguous and its two annual peaks occur during the spring (close to April) and fall (closer to November) seasons. Thus, the counties in this cluster have outdoor activity more frequently than average during both the winter and the summer. The counties in this cluster correspond to locations that are hubs for winter or other tourism, which we speculate is driving their unique dynamics (Figure 2—figure supplement 4). Figure 2 with 5 supplements see all Download asset Open asset Using a time series clustering approach on the indoor activity time series for each US county, we identify groups of counties that experience similar trends in indoor activity. Locations in the northern cluster (light blue) follow a single peak pattern with the highest indoor activity occurring every winter. Locations in the southern cluster (dark blue) experience two peaks in indoor activity each year, one in the winter and a second, smaller one in the summer. The third cluster also experiences two peaks not matching environmental conditions, but potentially corresponding to winter or other tourism areas. We apply a rolling window mean to the time series for visualization purposes. Characterizing pandemic disruption to baseline indoor activity seasonality In addition to the description of indoor activity seasonality at baseline, we examine the impact of a large-scale disruption – the COVID-19 pandemic – on these patterns. We compare indoor activity seasonality during the COVID-19 pandemic in 2020 to the baseline patterns of 2018 and 2019. We find that the temporal trends in indoor activity are less geographically structured in 2020 than those of previous years (see Figure 3—figure supplement 2 for a characterization of the time series patterns). We find that indoor activity deviated from pre-pandemic trends beyond interannual deviations (Figure 3—figure supplement 1). We focus on four case studies to highlight the varying impacts on indoor activity of the pandemic disruption (Figure 3). In all four communities, 2020 indoor activity trends shift from 2018 and 2019 patterns, with Maricopa County (home of the city of Phoenix, Arizona) showing the least perturbation relative to prior years. We also find that in early 2020, when there was substantial social distancing in the United States (e.g. school closures, remote work), activity was more likely to be outdoors than in prior years, independent of changes in overall activity levels. With our case studies, we highlight that social distancing policies can have different impacts on airborne exposure risk in different locations: while some locations, such as Travis County (home of Austin, Texas), shifted activities outdoors during this period, reducing their overall risk further, other locations, such as Charleston County (home of Charleston, South Carolina) increased indoor activity above the seasonal average during this period, potentially diminishing the effect of reducing overall mobility. The trends in Charleston are representative of those in the southeastern United States during the spring of 2020 (Figure 3—figure supplement 1). By the end of 2020 (and the first winter wave of SARS-CoV-2), many parts of the country were shifting activity more outdoors than seasonally expected (Figure 3—figure supplement 1). Figure 3 with 2 supplements see all Download asset Open asset Indoor activity during the COVID-19 pandemic was shifted: We compare indoor activity trends in the baseline years of 2018 and 2019 to the pandemic year 2020 in four case study locations. We find that most locations saw a shift in their indoor activity patterns, while others (such as Maricopa County) did not. We also find that while overall activity was diminished uniformly during the Spring of 2020, indoor activity decreased in some locations (Travis County, Texas and Baltimore County, Maryland) and increased in others (Charleston County, South Carolina). We apply a three week rolling window mean to the time series for visualization purposes. Implications for modeling seasonal disease dynamics We use this finely-grained spatiotemporal information on indoor activity to incorporate airborne exposure risk seasonality into compartmental models of disease dynamics using common, coarser seasonal forcing approaches. To investigate the impact of heterogeneity in σ on the estimation of seasonal forcing for infectious disease models, we fit a sinusoidal model to the time series of indoor activity for each of the primary clusters (Figure 4A). We note that because σ is defined as deviation from baseline indoor activity, the sinusoidal parameters (amplitude, frequency, phase) should be interpreted as a measure of seasonality in indoor activity, relative to each location’s baseline. We find that the parameters of seasonality vary across clusters: the amplitude is higher, and the phase is lower in the northern cluster compared to the southern cluster, indicating a difference in the variability of indoor and outdoor activity seasonality in each cluster (Figure 4—figure supplement 1). While the fits are comparable for both clusters (Figure 4—figure supplement 2), the sinusoidal model does not capture the second peak of indoor activity during the summer months in the southern cluster. These differences in best fit indicate that sinusoidal models may have an overly restrictive functional form, limiting the accuracy of the approximation, and may underestimate the impacts of seasonality on transmission, obscuring systematic differences between regions. Furthermore, differences in seasonal activity of the observed magnitude can have important implications for disease modeling; applying region-level and co

Full Text