Abstract

Bump-hunting or mode identification is a fundamental problem that arises in almost every scientific field of data-driven discovery. Surprisingly, very few data modeling tools are available for automatic (not requiring manual case-by-case investigation), objective (not subjective), and nonparametric (not based on restrictive parametric model assumptions) mode discovery, which can scale to large data sets. This article introduces LPMode–an algorithm based on a new theory for detecting multimodality of a probability density. We apply LPMode to answer important research questions arising in various fields from environmental science, ecology, econometrics, analytical chemistry to astronomy and cancer genomics.

Highlights

  • Many scientific problems seek to identify modes in the true unknown probability density function f (x) of a variable X, given i.i.d observations X1, . . . , Xn

  • Two different classes of bump-hunting methods are currently prevailing in the literature, which provide insights at different levels of granularity and details: (i) testing multimodality or deviation from unimodality; (ii) determining how many modes are present in a probability density function

  • The idea of using kernel density for nonparametric mode identification goes back to the seminal work of Parzen (1962). This was furthered studied by Silverman (1981) based on the concept of “critical bandwidths” and bootstrapping, which is known to be highly conservative, non-robust, and generate different answers based on various calibration techniques

Read more

Summary

Introduction

Many scientific problems seek to identify modes in the true unknown probability density function f (x) of a variable X, given i.i.d observations X1, . . . , Xn. Many scientific problems seek to identify modes in the true unknown probability density function f (x) of a variable X, given i.i.d observations X1, . The goal is to learn and compare the multi-modality shape of each variables. This problem of finding structures in the form of hidden bumps arises in many data-intensive sciences. We address the intellectual challenge of developing novel algorithm for ‘large-scale nonparametric mode exploration’–a problem of outstanding interest at the present time. Two different classes of bump-hunting methods are currently prevailing in the literature, which provide insights at different levels of granularity and details: (i) testing multimodality or deviation from unimodality; (ii) determining how many modes are present in a probability density function. The purpose of this paper is to present a new genre of nonparametric mode identification technique for (iii) comprehensive mode identification: determining number of modes (along with locations), as well as standard errors or confidence intervals of the associated mode positions to assess significance and uncertainty

Two modeling cultures
Skew-G density representation
Constructing empirical orthogonal rank polynomials
Estimation and properties
Model denoising
Consistency of local mode estimates
LPMode algorithm and inference
Econometrics
Cancer genomics
Asteroid data
Galaxy color data
Analytical chemistry
Biological science
Philately
Ecological science
Simulation studies
Discussion
Findings
Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.