Abstract

Jump models switch infrequently between states to fit a sequence of data while taking the ordering of the data into account. We propose a new framework for joint feature selection, parameter and state-sequence estimation in jump models. Feature selection is necessary in high-dimensional settings where the number of features is large compared to the number of observations and the underlying states differ only with respect to a subset of the features. We develop and implement a coordinate descent algorithm that alternates between selecting the features and estimating the model parameters and state sequence, which scales to large data sets with large numbers of (noisy) features. We demonstrate the usefulness of the proposed framework by comparing it with a number of other methods on both simulated and real data in the form of financial returns, protein sequences, and text. The resulting sparse jump model outperforms all other methods considered and is remarkably robust to noise.

Highlights

  • Jump models are a general class of latent variable models useful for describing systems characterized by abrupt, persistent changes of state dynamics

  • We compare the accuracy of the standard jump model and the sparse jump model to that of the greedy Gaussian segmentation (GGS) approach proposed by Hallac et al (2019) in determining the break points between the five articles

  • We proposed a general framework for joint feature selection, parameter and state-sequence estimation in jump models

Read more

Summary

Introduction

Jump models are a general class of latent variable models useful for describing systems characterized by abrupt, persistent changes of state dynamics. Feature selection works well in many statistical methods, including clustering, classification, and regression models It is perhaps most common in supervised learning, where labeled training data is available to determine the distinguishing features. We present a feature selection framework for jump models inspired by Witten and Tibshirani (2010) They proposed a feature selection approach for clustering based on alternating between selecting feature weights and clustering a weighted version of the data. Through computational experiments on two simulated and three real data sets, we show that the jump model with feature selection is remarkably robust to noise and is able to select relevant features It yields more accurate state estimates compared to a number of other common methods including HMMs. Our data and implementation in Python is available online as supplementary material.

Related work
Jump models
Jump penalty
Estimation
Feature selection
Sparse K-means clustering
Sparse jump models
Simulation study
Models
Results when features are conditionally independent
Results when noise features are correlated
Industry portfolio returns
Results
Chromatin proteins
Data description
Chromatin states
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call