Recent Advances in Discriminant Analysis for High-dimensional Data Classification

Herbert Pang,Tiejun Tong

doi:10.4172/2155-6180.1000e106

Abstract

There are serious challenges posed by high-dimensional data sets. With the arrival of new technologies, high-throughput modeling is becoming a norm in many disciplines such as statistical genetics, epidemiology, astronomy, high energy physics, and ecology. Highdimensional data have emerged from various sources such as digital images, documents, next-gen sequencing, mass spectrometry, metabolomics, microarray, proteomics, online videos and web pages. One area with a growing need for new statistical methods and theory for high-dimensional data is the classification of subgroups. For example, cancer classification has primarily been based on histopathological appearance of tumor. However, patients with similar tumor appearance can have different prognosis and response to treatment. The traditional way to classify cancer by pathological review may cause biased results and misclassify the tumor subtypes for patients. The availability of microarray data allows simultaneous measures of thousands of genes. These high-dimensional data have become a standard tool for biomedical studies and are now commonly collected from patients in clinical trials. The identification of informative genes may result in potential molecular markers for tumor class prediction. Correct classifications can help practitioners identify the right treatment for patients. Due to the cost and/or experimental difficulties in obtaining sufficient biological materials, it is common to see studies with sample size much smaller than the number of dimensions. These problems are referred to as “large p small n” issues, where p is the number of dimensions (or say genes) and n is the sample size. High-dimensional data pose challenges to traditional statistical methods. For instance, owing to small n, there are increased uncertainties in the standard estimations of parameters such as means and variances. As a consequence, statistical analyses based on such parameters estimation are usually unreliable. To have improved parameters estimation, researchers have come up with innovative ways to deal with this.

Highlights

There are serious challenges posed by high-dimensional data sets
Pang [7] applied the shrinkage estimates of variances in Tong [9] into the diagonal discriminant scores, and formed two shrinkage-based rules called Shrinkage-based DQDA (SDQDA) and Shrinkage-based Diagonal Linear Discriminant Analysis (DLDA) (SDLDA)
The assumptions made in the diagonal discriminant analysis and its variations may not be realistic

Summary

Introduction

Recent Advances

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biometrics & Biostatistics	Publication Date: Jan 1, 2012
Citations: 12	License type: cc-by

R Discovery Prime

R Discovery Prime

Recent Advances in Discriminant Analysis for High-dimensional Data Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biometrics & Biostatistics

Lead the way for us

Similar Papers

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data
Tahir Mehmood ... Zahid Rasheed
Communications for Statistical Applications and Methods | VOL. 22
Tahir Mehmood, et. al.Tahir Mehmood ... Zahid Rasheed
30 Nov 2015
Communications for Statistical Applications and Methods | VOL. 22

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
Jörg Rahnenführer ... Willi Sauerbrei
BMC Medicine | VOL. 21
Jörg Rahnenführer, et. al.Jörg Rahnenführer ... Willi Sauerbrei
15 May 2023
BMC Medicine | VOL. 21

A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional data
Tarcísio Lucas ... Teresa B Ludermir
Applied Soft Computing | VOL. 59
Tarcísio Lucas, et. al.Tarcísio Lucas ... Teresa B Ludermir
08 Jun 2017
Applied Soft Computing | VOL. 59

Variable selection of spectroscopic data through monitoring both location and dispersion of PLS loading weights
Tahir Mehmood ... Arslan Munir Turk
Journal of the Korean Statistical Society | VOL. 50
Tahir Mehmood, et. al.Tahir Mehmood ... Arslan Munir Turk
19 Jan 2021
Journal of the Korean Statistical Society | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recent Advances in Discriminant Analysis for High-dimensional Data Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biometrics &amp; Biostatistics

More From: Journal of Biometrics & Biostatistics