Survey of the US Small and Medium Providers in the Residential Broadband Internet: New Sampling Approaches

Boris Houenou,Steve Lanning

doi:10.2139/ssrn.3749620

Abstract

While FCC 477 data reports on more than two thousand providers, most sampling strategies to improve on the well unknown shortcomings of that data rely on, at best, a couple of hundreds of providers and most on less than 40. Limited sampling raises selection and representation concerns that could affect the quality of the collected data, especially when such data could be used to form policies aiming at reducing digital divide. Those concerns escalate quickly in because data are strategic assets within the industry. With more complete sampling, we show these concerns are realized in meaningful ways. There are few industries that pose the challenge of scarce publicly available data to the same extent as broadband Internet. Information about pricing and products features of top ISP is used as a generalizable fact in the sector leaving out information on two-thousands plus small and medium providers that are serving a substantial share of the US households. This paper is a surveys as many small providers as possible in the US by basing sampling on available data to the variation in prices and one-time fees, and data rate in the residential broadband Internet to verify there is little or no variation within the sampling groups. We randomly sample over more than two thousand providers in a combination of provider, state, county type and technology type. First, using Whistleout data we validate that almost all of the price variation is captured at county type level. We might be concerned that Whistleout sampling does not capture variation in our sampling frame. To check for this possibility, we selected a ten percent of sample for two major providers to validate the information against the actual observed prices. Our sampled result match the actuals and we use this as the basis for sampling across all providers to improve on the limitation of sampling too few providers. Merged tables of Merkle US household data, the FCC 477 data and the USDA RUCC codes form the data universe. After validating there is little to no statistical price variation below a county type level by state, we randomly sampled for a tuple of state, county type, technology code and provider company, five US residential addresses to constitute a block unit of observation. There is a total of 46,796 addresses. Household addresses, county type, data rate and the provider company are drawn from Merkle data, USDA RUCC data and FCC 477 data. We, then, manually collected the pricing, equipment and activation fees and the data rate of each provider’s plans available at a given address, using the service availability tool on the provider’s website (when available) or searching an address through interactive coverage area maps that are relevant to most WISPs. The information at a block unit of observation is projected back to a census block. By the means of data visualization, descriptive statistics and analysis of variance, we expect to represent price and one-time fees, and data rate (download speed) variation with a resolution as low as a US census block. Price and data rate variation are location-based and there is no homogeneity in the direction and magnitude of the variation at rural or urban level. While overreporting is a very well known issue with FCC 477 reports, underreporting is also likely. For instance, in Nevada, overreporting and underreporting concern 80% and 20% of census blocks, respectively. The correlation between price and data rate is also heterogenous given the county type and technology tier at urban versus rural areas. Although a sampling approach that focuses on top providers might be cost effective, it does not necessarily deliver the quality in information because it overlooks variation in price and data rate that only an inclusive approach that samples also small providers can meet. This research is novel in two streams. It proposes a sampling technique that applies a consistent sampling technique to collect data on pricing and one-time fees to improve on publicly available pricing data for the broadband Internet industry in the US. Furthermore, it constitutes a data universe that offers a more realistic picture of the landscape for tracking the magnitude of the digital divide, affordability, and serviceability that includes small and medium providers that are missed by most competitive pricing collection methods.

Full Text