While data plays an important role in transportation research, sampled data is not always reliable. Data reliability issue is significant especially for minority groups. In this study, a districting approach is proposed which improves data reliability through aggregation of basic spatial units (BSU), adapted from a max-p-regions problem. The model generates as many aggregated zones as possible that minimize intrazonal heterogeneity while minimizing data margin of error (MOE) of all aggregated zones using a controlling MOE threshold. The problem is first formulated as an integer programming which selects optimal set of zones from a pre-generated set of candidate zones. The difficulty of solving the formulation lies in the generation of the candidate set, so a heuristic solution algorithm is proposed. Two case studies are provided to illustrate the method and validate its performance by evaluating the resulting data quality in an example subsequent planning model. First is an area in Downtown Manhattan with 62 census tracts, comparing the aggregated zones with Neighborhood Tabulation Areas (NTAs) and Taxi Zones. Second is the generation of the New York City Equitable Zoning (NYCEZ), which generated 574 Equitable Zones that reduce the average MOE% of demographic data by 48% for seniors, 75% for low-income population, and 46% for long commuters, all with a district number that is higher than NTAs (221) and Taxi Zones (263). NYCEZ and census tracts are then compared in a subsequent model, synthetic population generation, showing an improvement of 6.2% in standard deviation across simulated populations under the proposed zone design. NYCEZ showed smaller variation in the generated population data. The algorithm can help the decision making of public agencies and the service design of mobility providers by producing reliable and equitable data. The algorithm can also be applied to data-sharing between mobility providers and agencies to alleviate privacy concerns.
Read full abstract