Exploratory Data Analysis Methods Research Articles

Abstract. Methods for exploratory data analysis and exploratory spatial data analysis (or ESDA) are useful to identify outliers, clusters, skewed distributions and correlations (see Figure 1 for examples implemented in our GeoDa software). Researchers routinely use these methods to find insights.However, what motivated this project is that, by default, it is easier to find insights that confirm the expected. Often, in fields like geography, statistics or computer science, available software and data drive the choice of research questions and the process of how we explore data. Typical insights gained this way are descriptive, like where a cluster is located or whether variables are correlated.To be sure, expected insights are reassuring. And descriptive insights are important. But instead of stopping here we want to build on them to also find insights that are new and relevant – we want to discover the unexpected (Anselin, 1998; Kielman &amp; May, 2009). And, while we do need to know where clusters and outliers are – as researchers, we also want to go further and explain why these patterns exist (Good, 1983). In this project we presume that there are ways for structuring the process of data exploration that make it more likely to discover unexpected and explanatory insights (Platt, 1964).This presentation summarizes results from a summer 2020 lab where we started experimenting with how to do this using our Center’s GeoDa software. The summer lab was directed by Julia Koschinsky of the University of Chicago’s Center for Spatial Data Science. Marcos Falcone helped mentor five young University of Chicago and high school students for 7–10 weeks (majoring in statistics, computation, geography, political science, and economics).Our approach was to draw on philosophy of science and scientific reasoning to understand how the discovery of unexpected and explicable insights can work. We then tried to translate this to research designs for ESDA. Finally, we implemented the designs in replicable prototype examples for teaching and learning spatial research at the undergraduate or high school level.For instance, in terms of scientific reasoning, classic work on causal explanations (Mill, 1843) augments the typical current focus on correlations by also highlighting the need to assess the plausibility of your own explanation versus alternatives. This requires a mindset and practice of rigorously testing how our explanations might be wrong (Popper, 1959) rather than confirming that they're right (Nuzzo, 2015; Kahneman, 2011). To do this requires an iterative exchange between data and explanations – referred to as abductive reasoning, as it combines inductive and deductive approaches (Peirce, 1878; Heckman and Singer, 2017). We used Sherlock Holmes stories and the famous John Snow cholera case to illustrate the structure of these scientific reasoning concepts for a high school context (Coleman, 2019; cf Konnikova, 2013; Vinten-Johansen, 2020).Scientific reasoning goes back hundreds of years. Our challenge this summer and from here on has been to translate this reasoning to research designs that are applicable to modern interactive ESDA tools. Each of us developed four prototype resources for teaching and learning ESDA in GeoDa that we will develop further (Fig. 2): 1) protocols for how this could be done; 2) case examples to apply and revise the protocol; 3) GeoDa demo scripts to make the examples replicable; and 4) cleaned data and documentation. These resources will be released as part of a GeoDa Cookbook in the near future.Fig. 3 illustrates one of the protocols that differs from how ESDA is typically navigated. The starting point is the exploration of patterns in the outcome variable of interest. Next is the formulation of alternative explanations whose patterns plausibly match those of the outcome variable. Then we draw on quasi-experimental research designs to structure the testing of this match (Shadish et al., 2002). Finally, data about the hypothesized explanations are analyzed with ESDA and regressions to test or reformulate the hypotheses as part of an abductive process.

Read full abstract

Long-term community resilience, which privileges a long-view look at chronic, slow-moving issues affecting communities, has begun to draw more attention from researchers and policymakers. In the Valley of the Sun, resilience to heat is both a necessity and a way of life. Solutions are ubiquitous but nevertheless still in demand over the long, hot summers in the Phoenix, Arizona metropolitan area. Residents heavily rely on air conditioning (AC) for relief from heat stress, illness, and to prevent indoor heat-related deaths. However, paying for the electricity to keep homes cool can be expensive and the electric bills can be cost prohibitive for many low-income individuals and families. Local government agencies, non-governmental organizations (NGOs), and charitable organizations have programs that provide financial assistance for qualified applicants offering limited relief from electricity costs. To better understand the utility assistance landscape in the Phoenix metropolitan area as a contributor to heat resilience among vulnerable communities, we created a collaborative team of individuals from the university and the Salvation Army, one of the more than 80 organizations that provides emergency economic aid for low-income families to pay high-cost electricity bills, to articulate insights about systemic efficiencies and efficacies, from a data-informed perspective. We utilized exploratory data analysis and advanced spatial analytical methods with the Salvation Army, to build a shared understanding of knowledge gaps and verified hunches. Our collaborative research confirms that minority groups (African American and Native American) disproportionately require assistance. Meanwhile, 30% of the travel time and distance to intake interviews could be saved by switching from zip code-based assignment systems to address-based assignment systems. Budgeting across empirically identified temporal patterns of need could offer resilience benefits to the most vulnerable. As a result of this community research partnership, data from the Salvation Army reveals the character and dimension of critical challenges within the utility assistance system as a whole, informs both immediate solutions and builds a knowledge base for transforming future operations for the organization, while it shapes broader conversations across the community of service providers about heat resilience in both spatial and temporal terms.

Read full abstract

Exploratory Data Analysis Methods Research Articles

Related Topics

Articles published on Exploratory Data Analysis Methods

Exploratory Data Analysis of Human Activity Recognition Based on Smart Phone

Exploratory Data Analysis for Social Big Data Using Regression and Recurrent Neural Networks

Spatio-temporal trends and influencing factors of PM2.5 concentrations in urban agglomerations in China between 2000 and 2016.

Research on coupling coordination and spatial differentiation of new-type urbanization and ecological environment in Wanjiang demonstration area

Discovering the Unexpected & Explicable: Scientific Reasoning and Research Design for Spatial Data Analysis

TIPOLOGIA DE CLUSTER NA PRODUÇÃO DA TILÁPIA: UM ESTUDO PARA O ESTADO DO PARANÁ

Approach to Urban Environmental Justice Using Exploratory Spatial Data Analysis. The Case of Valencia’s Monumental Trees

Exploratory data analysis on large data sets: The example of salary variation in Spanish Social Security Data

The potential of heat release rate and cylinder pressure feedback control for conventional and premixed charge compression ignition combustion

An Empirical Study on the Ecological Economy of the Huai River in China

Articulating strategies to address heat resilience using spatial optimization and temporal analysis of utility assistance data of the Salvation Army Metro Phoenix

Spatial heterogeneity of factors influencing transportation CO2 emissions in Chinese cities: based on geographically weighted regression model

Application of exploratory and Spatial Data Analysis (SDA), singularity matrix analysis, and fractal models to delineate background of potentially toxic elements: A case study of Ahvaz, SW Iran

The spatial-temporal variation and convergence of green innovation efficiency in the Yangtze River Economic Belt in China.

Identification of ecosystem services supply and demand areas and simulation of ecosystem service flows in Shanghai

Study on the spatial distribution characteristics of urban innovation power in Yangtze River Delta urban agglomeration

The Data Mining Group at University of Vienna

Використання навчальних нейронних мереж для прогнозування подій у розробленні продукції IT

The financialization of single-family rental housing: An examination of real estate investment trusts’ ownership of single-family houses in the Atlanta metropolitan area

Sports Industry Agglomeration and Green Economic Growth—Empirical Research Based on Panel Data of 30 Provinces and Cities in China

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Exploratory Data Analysis Methods Research Articles

Related Topics

Articles published on Exploratory Data Analysis Methods

Exploratory Data Analysis of Human Activity Recognition Based on Smart Phone

Exploratory Data Analysis for Social Big Data Using Regression and Recurrent Neural Networks

Spatio-temporal trends and influencing factors of PM2.5 concentrations in urban agglomerations in China between 2000 and 2016.

Research on coupling coordination and spatial differentiation of new-type urbanization and ecological environment in Wanjiang demonstration area

Discovering the Unexpected &amp; Explicable: Scientific Reasoning and Research Design for Spatial Data Analysis

TIPOLOGIA DE CLUSTER NA PRODUÇÃO DA TILÁPIA: UM ESTUDO PARA O ESTADO DO PARANÁ

Approach to Urban Environmental Justice Using Exploratory Spatial Data Analysis. The Case of Valencia’s Monumental Trees

Exploratory data analysis on large data sets: The example of salary variation in Spanish Social Security Data

The potential of heat release rate and cylinder pressure feedback control for conventional and premixed charge compression ignition combustion

An Empirical Study on the Ecological Economy of the Huai River in China

Articulating strategies to address heat resilience using spatial optimization and temporal analysis of utility assistance data of the Salvation Army Metro Phoenix

Spatial heterogeneity of factors influencing transportation CO2 emissions in Chinese cities: based on geographically weighted regression model

Application of exploratory and Spatial Data Analysis (SDA), singularity matrix analysis, and fractal models to delineate background of potentially toxic elements: A case study of Ahvaz, SW Iran

The spatial-temporal variation and convergence of green innovation efficiency in the Yangtze River Economic Belt in China.

Identification of ecosystem services supply and demand areas and simulation of ecosystem service flows in Shanghai

Study on the spatial distribution characteristics of urban innovation power in Yangtze River Delta urban agglomeration

The Data Mining Group at University of Vienna

Використання навчальних нейронних мереж для прогнозування подій у розробленні продукції IT

The financialization of single-family rental housing: An examination of real estate investment trusts’ ownership of single-family houses in the Atlanta metropolitan area

Sports Industry Agglomeration and Green Economic Growth—Empirical Research Based on Panel Data of 30 Provinces and Cities in China

Discovering the Unexpected & Explicable: Scientific Reasoning and Research Design for Spatial Data Analysis