Abstract
BackgroundIndividuals are increasingly turning to search engines like Google to obtain health information and access resources. Analysis of Google search queries offers a novel approach, which is part of the methodological toolkit for infodemiology or infoveillance researchers, to understanding population health concerns and needs in real time or near-real time. While searches predominantly have been examined with the Google Trends website tool, newer application programming interfaces (APIs) are now available to academics to draw a richer landscape of searches. These APIs allow users to write code in languages like Python to retrieve sample data directly from Google servers.ObjectiveThe purpose of this paper is to describe a novel protocol to determine the top queries, volume of queries, and the top sites reached by a population searching on the web for a specific health term. The protocol retrieves Google search data obtained from three Google APIs: Google Trends, Google Health Trends (also referred to as Flu Trends), and Google Custom Search.MethodsOur protocol consisted of four steps: (1) developing a master list of top search queries for an initial search term using Google Trends, (2) gathering information on relative search volume using Google Health Trends, (3) determining the most popular sites using Google Custom Search, and (4) calculating estimated total search volume. We tested the protocol following key procedures at each step and verified its usefulness by examining search traffic on birth control in 2017 in the United States. Two separate programmers working independently achieved similar results with insignificant variation due to sample variability.ResultsWe successfully tested the methodology on the initial search term birth control. We identified top search queries for birth control, of which birth control pill was the most popular and obtained the relative and estimated total search volume for the top queries: relative search volume was 0.54 for the pill, corresponding to an estimated 9.3-10.7 million searches. We used the estimates of the proportion of search activity for the top queries to arrive at a generated list of the most popular websites: for the pill, the Planned Parenthood website was the top site.ConclusionsThe proposed methodological framework demonstrates how to retrieve Google query data from multiple Google APIs and provides thorough documentation required to systematically identify search queries and websites, as well as estimate relative and total search volume of queries in real time or near-real time in specific locations and time periods. Although the protocol needs further testing, it allows researchers to replicate the steps and shows promise in advancing our understanding of population-level health concerns.International Registered Report Identifier (IRRID)RR1-10.2196/16543
Highlights
Individuals in the United States seeking health information online turn to search engines first
We identified top search queries for birth control, of which birth control pill was the most popular and obtained the relative and estimated total search volume for the top queries: relative search volume was 0.54 for the pill, corresponding to an estimated 9.3-10.7 million searches
The Google Health Trends application programming interface (API), previously known as Google Flu Trends, gives normalized relative search volume (RSV) across a set of search queries, allowing for more in-depth analysis of the relationships between queries. This RSV refers to the proportion of searches for a specific query as compared to the sum total of searches for a set of queries, and differs from the relative search index given by Google Trends, which gives search interest relative to all searches during the specified period of time
Summary
Individuals in the United States seeking health information online turn to search engines first. The tool has been used to study public interest in cancer [11,12], suicide assessment [13,14], depression-related information seeking [15], lifestyle-disease surveillance [16], bariatric surgery [17], herpes zoster vaccinations [18], searches for walk-in clinics and emergency departments [19], obesity-related behavior [20], and reproductive health [21,22,23,24,25,26] Research using this tool has increased over 20-fold between 2009 and 2018 [27]. These APIs allow users to write code in languages like Python to retrieve sample data directly from Google servers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.