Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems

Joseph Geraci,Moyez Dharsee,Paulo Nuin,Madhuri Koti,Alexandria Haslehurst,Harriet E Feilotter,Ken Evans

doi:10.1093/bioinformatics/btt602

Joseph Geraci, Moyez Dharsee + Show 5 more

Open Access

https://doi.org/10.1093/bioinformatics/btt602

Copy DOI

Abstract

We introduce a novel method for visualizing high dimensional data via a discrete dynamical system. This method provides a 2D representation of the relationship between subjects according to a set of variables without geometric projections, transformed axes or principal components. The algorithm exploits a memory-type mechanism inherent in a certain class of discrete dynamical systems collectively referred to as the chaos game that are closely related to iterative function systems. The goal of the algorithm was to create a human readable representation of high dimensional patient data that was capable of detecting unrevealed subclusters of patients from within anticipated classifications. This provides a mechanism to further pursue a more personalized exploration of pathology when used with medical data. For clustering and classification protocols, the dynamical system portion of the algorithm is designed to come after some feature selection filter and before some model evaluation (e.g. clustering accuracy) protocol. In the version given here, a univariate features selection step is performed (in practice more complex feature selection methods are used), a discrete dynamical system is driven by this reduced set of variables (which results in a set of 2D cluster models), these models are evaluated for their accuracy (according to a user-defined binary classification) and finally a visual representation of the top classification models are returned. Thus, in addition to the visualization component, this methodology can be used for both supervised and unsupervised machine learning as the top performing models are returned in the protocol we describe here. Butterfly, the algorithm we introduce and provide working code for, uses a discrete dynamical system to classify high dimensional data and provide a 2D representation of the relationship between subjects. We report results on three datasets (two in the article; one in the appendix) including a public lung cancer dataset that comes along with the included Butterfly R package. In the included R script, a univariate feature selection method is used for the dimension reduction step, but in the future we wish to use a more powerful multivariate feature reduction method based on neural networks (Kriesel, 2007). A script written in R (designed to run on R studio) accompanies this article that implements this algorithm and is available at http://butterflygeraci.codeplex.com/. For details on the R package or for help installing the software refer to the accompanying document, Supporting Material and Appendix.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: Oct 21, 2013
Citations: 6

Similar Papers

Do changes in feature selection parameters influence the classification of knee rehabilitation exercises when using body worn accelerometer data?
P Jones ... C Holt
Osteoarthritis and Cartilage | VOL. 27
P Jones, et. al.P Jones ... C Holt
01 Apr 2019
Osteoarthritis and Cartilage | VOL. 27

Improving the classification of high dimensional class-imbalanced data using the Chaos particle swarm optimization with Levy Flight
Mohammad Ali Zarif ... Javad Hamidzadeh
-
Mohammad Ali Zarif, et. al.Mohammad Ali Zarif ... Javad Hamidzadeh
28 Oct 2021
28 Oct 2021

A kind of nonnegative matrices and its application on the stability of discrete dynamical systems
Xiaoping Xue ... Liang Guo
Journal of Mathematical Analysis and Applications | VOL. 331
Xiaoping Xue, et. al.Xiaoping Xue ... Liang Guo
23 Oct 2006
Journal of Mathematical Analysis and Applications | VOL. 331

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology.
Chengyuan Huang
Computational Intelligence and Neuroscience | VOL. 2021
Chengyuan HuangChengyuan Huang
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems

Abstract

Talk to us

Similar Papers

More From: Bioinformatics