Feature Selection in Jump Models

Peter Nystrup,Petter N Kolm,Erik Lindstrom

doi:10.2139/ssrn.3805831

Abstract

Jump models switch infrequently between states to fit a sequence of data while taking the ordering of the data into account. We propose a new framework for joint feature selection, parameter and state-sequence estimation in jump models. Feature selection is necessary in high-dimensional settings where the number of features is large compared to the number of observations and the underlying states differ only with respect to a subset of the features. We develop and implement a coordinate descent algorithm that alternates between selecting the features and estimating the model parameters and state sequence, which scales to large data sets with large numbers of (noisy) features. We demonstrate the usefulness of the proposed framework by comparing it with a number of other methods on both simulated and real data in the form of financial returns, protein sequences, and text. The resulting sparse jump model outperforms all other methods considered and is remarkably robust to noise.

Highlights

Jump models are a general class of latent variable models useful for describing systems characterized by abrupt, persistent changes of state dynamics
We compare the accuracy of the standard jump model and the sparse jump model to that of the greedy Gaussian segmentation (GGS) approach proposed by Hallac et al (2019) in determining the break points between the five articles
We proposed a general framework for joint feature selection, parameter and state-sequence estimation in jump models

Summary

Introduction

Jump models are a general class of latent variable models useful for describing systems characterized by abrupt, persistent changes of state dynamics. Feature selection works well in many statistical methods, including clustering, classification, and regression models It is perhaps most common in supervised learning, where labeled training data is available to determine the distinguishing features. We present a feature selection framework for jump models inspired by Witten and Tibshirani (2010) They proposed a feature selection approach for clustering based on alternating between selecting feature weights and clustering a weighted version of the data. Through computational experiments on two simulated and three real data sets, we show that the jump model with feature selection is remarkably robust to noise and is able to select relevant features It yields more accurate state estimates compared to a number of other common methods including HMMs. Our data and implementation in Python is available online as supplementary material.

Related work

Jump models

Jump penalty

Estimation

Feature selection

Sparse K-means clustering

Sparse jump models

Simulation study

Models

Results when features are conditionally independent

Results when noise features are correlated

Industry portfolio returns

Results

Chromatin proteins

Data description

Chromatin states

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feature Selection in Jump Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SSRN

Lead the way for us

Journal: SSRN	Publication Date: Jan 1, 2021
License type: other-oa

Similar Papers

Feature selection in jump models
Peter Nystrup ... Erik Lindström
Expert systems with applications | VOL. 184
Peter Nystrup, et. al.Peter Nystrup ... Erik Lindström
13 Jul 2021
Expert systems with applications | VOL. 184

A New Swarm-Based Framework for Handwritten Authorship Identification in Forensic Document Analysis
Satrya Fajri Pratama ... Noor Azilah Muda
-
Satrya Fajri Pratama, et. al.Satrya Fajri Pratama ... Noor Azilah Muda
01 Jan 2014
01 Jan 2014

An Efficient Multi-layer Ensemble Framework with BPSOGSA-Based Feature Selection for Credit Scoring Data Analysis
Damodar Reddy Edla ... Ramalingaswamy Cheruku
Arabian Journal for Science and Engineering | VOL. 43
Damodar Reddy Edla, et. al.Damodar Reddy Edla ... Ramalingaswamy Cheruku
08 Nov 2017
Arabian Journal for Science and Engineering | VOL. 43

A multi-kernel based framework for heterogeneous feature selection and over-sampling for computer-aided detection of pulmonary nodules
Peng Cao ... Osmar Zaiane
Pattern Recognition | VOL. 64
Peng Cao, et. al.Peng Cao ... Osmar Zaiane
14 Nov 2016
Pattern Recognition | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Selection in Jump Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SSRN