Abstract
Bump-hunting or mode identification is a fundamental problem that arises in almost every scientific field of data-driven discovery. Surprisingly, very few data modeling tools are available for automatic (not requiring manual case-by-case investigation), objective (not subjective), and nonparametric (not based on restrictive parametric model assumptions) mode discovery, which can scale to large data sets. This article introduces LPMode–an algorithm based on a new theory for detecting multimodality of a probability density. We apply LPMode to answer important research questions arising in various fields from environmental science, ecology, econometrics, analytical chemistry to astronomy and cancer genomics.
Highlights
Many scientific problems seek to identify modes in the true unknown probability density function f (x) of a variable X, given i.i.d observations X1, . . . , Xn
Two different classes of bump-hunting methods are currently prevailing in the literature, which provide insights at different levels of granularity and details: (i) testing multimodality or deviation from unimodality; (ii) determining how many modes are present in a probability density function
The idea of using kernel density for nonparametric mode identification goes back to the seminal work of Parzen (1962). This was furthered studied by Silverman (1981) based on the concept of “critical bandwidths” and bootstrapping, which is known to be highly conservative, non-robust, and generate different answers based on various calibration techniques
Summary
Many scientific problems seek to identify modes in the true unknown probability density function f (x) of a variable X, given i.i.d observations X1, . . . , Xn. Many scientific problems seek to identify modes in the true unknown probability density function f (x) of a variable X, given i.i.d observations X1, . The goal is to learn and compare the multi-modality shape of each variables. This problem of finding structures in the form of hidden bumps arises in many data-intensive sciences. We address the intellectual challenge of developing novel algorithm for ‘large-scale nonparametric mode exploration’–a problem of outstanding interest at the present time. Two different classes of bump-hunting methods are currently prevailing in the literature, which provide insights at different levels of granularity and details: (i) testing multimodality or deviation from unimodality; (ii) determining how many modes are present in a probability density function. The purpose of this paper is to present a new genre of nonparametric mode identification technique for (iii) comprehensive mode identification: determining number of modes (along with locations), as well as standard errors or confidence intervals of the associated mode positions to assess significance and uncertainty
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.