Abstract

All features of any data type are universally equipped with categorical nature revealed through histograms. A contingency table framed by two histograms affords directional and mutual associations based on rescaled conditional Shannon entropies for any feature-pair. The heatmap of the mutual association matrix of all features becomes a roadmap showing which features are highly associative with which features. We develop our data analysis paradigm called categorical exploratory data analysis (CEDA) with this heatmap as a foundation. CEDA is demonstrated to provide new resolutions for two topics: multiclass classification (MCC) with one single categorical response variable and response manifold analytics (RMA) with multiple response variables. We compute visible and explainable information contents with multiscale and heterogeneous deterministic and stochastic structures in both topics. MCC involves all feature-group specific mixing geometries of labeled high-dimensional point-clouds. Upon each identified feature-group, we devise an indirect distance measure, a robust label embedding tree (LET), and a series of tree-based binary competitions to discover and present asymmetric mixing geometries. Then, a chain of complementary feature-groups offers a collection of mixing geometric pattern-categories with multiple perspective views. RMA studies a system’s regulating principles via multiple dimensional manifolds jointly constituted by targeted multiple response features and selected major covariate features. This manifold is marked with categorical localities reflecting major effects. Diverse minor effects are checked and identified across all localities for heterogeneity. Both MCC and RMA information contents are computed for data’s information content with predictive inferences as by-products. We illustrate CEDA developments via Iris data and demonstrate its applications on data taken from the PITCHf/x database.

Highlights

  • The author of the well-known 1977 book Exploratory Data Analysis (EDA) [1], John W

  • After computing a label embedding tree (LET) and predictive maps for any feature-group, the third task of our categorical exploratory data analysis (CEDA) for multiclass classification (MCC) is to discover an effective chain of complementary feature-sets

  • Under the setting of quantitative covariate features, we develop computational algorithms and protocols of CEDA for MCC and response manifold analytics (RMA) to extract data’s multiscale information contents with heterogeneity

Read more

Summary

Introduction

The author of the well-known 1977 book Exploratory Data Analysis (EDA) [1], John W. Before our CEDA developments under the MCC and RMA settings, a collection of computational concepts and devices used in this paper are illustrated with the well-known and straightforward Iris data example These concepts and devices in increasing order of complexity include possibly gapped histogram, contingency table, directional and mutual conditional entropy-based associations, mixing geometry in Rk with various dimensions k(≥2), and information contents of MCC and RMA. We visualize multiscale block patterns upon the 19 × 19 matrix lattice, as shown in Figure 4A–C for three pitch-types fastball, curveball, and slider, respectively Such feature-group is identified via an evident block within a heatmap and collectively stands for either a biomechanical or a physical mechanism within the pitching dynamics. In contrast with our goal in this paper, MCC’s predictive capability is just one of the essential by-products of its full information content

Multiscale Complexity in MCC Information Content
CEDA for MCC on Slider Data
Predictive Map of Mixing Geometric Pattern Categories
Slider’s MCC Information Content
Curveball’s MCC Information Content
Dissecting Uncertainty of Results from Machine Learning Algorithm
A2 A3 B1 B2 B3 C1 C2 C3
Findings
Conclusions from MCC and RMA Perspectives
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call