Abstract

Identifying structure underlying high-dimensional data is a common challenge across scientific disciplines. We revisit correspondence analysis (CA), a classical method revealing such structures, from a network perspective. We present the poorly-known equivalence of CA to spectral clustering and graph-embedding techniques. We point out a number of complementary interpretations of CA results, other than its traditional interpretation as an ordination technique. These interpretations relate to the structure of the underlying networks. We then discuss an empirical example drawn from ecology, where we apply CA to the global distribution of Carnivora species to show how both the clustering and ordination interpretation can be used to find gradients in clustered data. In the second empirical example, we revisit the economic complexity index as an application of correspondence analysis, and use the different interpretations of the method to shed new light on the empirical results within this literature.

Highlights

  • Identifying structure underlying high-dimensional data is a common challenge across scientific disciplines

  • We show how the different interpretations of the mathematics behind correspondence analysis (CA) can help in interpreting economic complexity; besides, we focus in particular on the interpretation of higher order eigenvectors and eigenvalues, which were hitherto not considered in the context of economic complexity

  • We provided an overview of different mathematical derivations that all lead to CA

Read more

Summary

Introduction

Identifying structure underlying high-dimensional data is a common challenge across scientific disciplines. Many systems in natural and social sciences are characterized by high dimensional data sets describing the interactions between the objects of study Such data can be analyzed by using statistical methods that reduce their complexity by identifying the low-dimensional structures that define the systems’ main features. Data represented in that way can be used to infer the associations (or similarities), between nodes of the same type, by considering for example how often species occur together in the same site In network terms, this entails ‘projecting’ the bipartite network onto one of its node sets, leading to a similarity n­ etwork[1]. Ecologists have been investigating the existence of latent variables that determine which species occur in which sites, a practice known as gradient analysis or ordination.

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call