Classification of Anti-learnable Biological and Synthetic Data

Adam Kowalczyk

doi:10.1007/978-3-540-74976-9_19

Abstract

We demonstrate a binary classification problem in which standard supervised learning algorithms such as linear and kernel SVM, naive Bayes, ridge regression, k-nearest neighbors, shrunken centroid, multilayer perceptron and decision trees perform in an unusual way. On certain data sets they classify a randomly sampled training subset nearly perfectly, but systematically perform worse than random guessing on cases unseen in training. We demonstrate this phenomenon in classification of a natural data set of cancer genomics microarrays using cross-validation test. Additionally, we generate a range of synthetic datasets, the outcomes of 0-sum games, for which we analyse this phenomenon in the i.i.d. setting.Furthermore, we propose and evaluate a remedy that yields promising results for classifying such data as well as normal datasets. We simply transform the classifier scores by an additional 1-dimensional linear transformation developed, for instance, to maximize classification accuracy of the outputs of an internal cross-validation on the training set. We also discuss the relevance to other fields such as learning theory, boosting, regularization, sample bias and application of kernels.KeywordsEsophageal CancerSynthetic DataSynthetic DatasetRidge RegressionKernel Support Vector MachineThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classification of Anti-learnable Biological and Synthetic Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes
Ujjwal Maulik ... Anirban Mukhopadhyay
BMC Bioinformatics | VOL. 10
Ujjwal Maulik, et. al.Ujjwal Maulik ... Anirban Mukhopadhyay
20 Jan 2009
BMC Bioinformatics | VOL. 10

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Synthetic Data Generation By Artificial Intelligence to Accelerate Translational Research and Precision Medicine in Hematological Malignancies
Saverio D'Amico ...
Blood | VOL. 140
Saverio D'Amico, et. al.Saverio D'Amico ...
15 Nov 2022
Blood | VOL. 140

Augmenting Physiological Time Series Data: A Case Study for Sleep Apnea Detection
...
-
, et. al. ...
16 Sep 2019
16 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification of Anti-learnable Biological and Synthetic Data

Abstract

Talk to us

Similar Papers