Machine learning for medical diagnostics : techniques for feature extraction, classification, and evaluation

Andrew Peter Bradley

doi:10.14264/uql.2016.727

Abstract

The use of computers as diagnostic aids in medicine is becoming a reality in the clinical arena; a major factor to this trend being the successful application of machine learning techniques. Three fundamentally different approaches to machine learning have been identified, which we call Exemplar, Hyper-plane, and Hyper-rectangle based methods. Part of this thesis is devoted to a novel hyper- rectangle based algorithm called the Multiscale Classifier (MSC), which is implemented as an inductive decision tree. The MSC can be applied to any N-dimensional classification problem, successively splitting feature space in half, using logic minimisation to control tree growth. Pruning techniques are then used to produce decision trees that are sensitive to the misclassification cost of examples. Such techniques are shown to produce different operational modes of classification which may be visualised using the Receiver Operating Characteristic (ROC) curve. The MSC has several significant advantages over other existing hyper-rectangle based approaches: learning is incremental; the tree is non-binary; and backtracking of decisions is possible. A feature extraction technique based on scale-space analysis is proposed and applied to texture measures extracted from images of cervical cell nuclei. Specifically, we model, as a function of scale, features derived from a Grey Level Co-occurrence Matrix (GLCM). On this data set the proposed technique was found to offer an improvement in performance over conventional feature extraction techniques. Methodologies for the evaluation of a number of machine learning algorithms (Bayesian, C4.5, K-NN, Perceptron, Multi-layer Perceptron, and the MSC) are explored using six real world medical diagnostic data sets. The performance of each algorithm is evaluated in terms of overall accuracy, sensitivity, specificity, area under the ROC curve (AUC), X2 test statistic, training time, and interpret ability. For each data set, an Analysis of Variance (ANOVA) is used to test the statistical significance of any differences between the cross-validated estimates of the accuracy and AUC performance measures. The benefits of AUC over accuracy as a performance measure are discussed in terms of increased statistical sensitivity, independence from a decision threshold, and invariance to prior class probabilities. It was found that the exemplar and hyper-plane based methods had marginally higher accuracies when compared to the hyper-rectangle based methods. However, the hyper-rectangle based methods are often more interpretable and less computationally intensive. The MSC was found to compare favourably with the other learning algorithms and has been established as a useful additional tool for machine learning in medical diagnostics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine learning for medical diagnostics : techniques for feature extraction, classification, and evaluation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The Conduct and Reporting of Meta-Analyses of Studies of Diagnostic Tests, and a Consideration of ROC Curves: Answers to the January 2010 Journal Club Questions
Teri A Reynolds ... David L Schriger
Annals of Emergency Medicine | VOL. 55
Teri A Reynolds, et. al.Teri A Reynolds ... David L Schriger
21 May 2010
Annals of Emergency Medicine | VOL. 55

Computer-Based Classification of Dermoscopy Images of Melanocytic Lesions on Acral Volar Skin
Hitoshi Iyatomi ... Masaru Tanaka
Journal of Investigative Dermatology | VOL. 128
Hitoshi Iyatomi, et. al.Hitoshi Iyatomi ... Masaru Tanaka
01 Aug 2008
Journal of Investigative Dermatology | VOL. 128

Comparison of Two or More Correlated AUCs in Paired Sample Design

Journal of Natural Sciences Research | VOL. 9

01 May 2019
Journal of Natural Sciences Research | VOL. 9

MiR-372-3p is a potential diagnostic factor for diabetic nephropathy and modulates high glucose-induced glomerular endothelial cell dysfunction via targeting fibroblast growth factor-16.
Zhiyun Meng ... Bin Wang
Archives of medical science : AMS | VOL. 19
Zhiyun Meng, et. al.Zhiyun Meng ... Bin Wang
11 Nov 2019
Archives of medical science : AMS | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning for medical diagnostics : techniques for feature extraction, classification, and evaluation

Abstract

Talk to us

Similar Papers