Principal component analysis for designed experiments.

Tomokazu Konishi

doi:10.1186/1471-2105-16-s18-s7

Abstract

BackgroundPrincipal component analysis is used to summarize matrix data, such as found in transcriptome, proteome or metabolome and medical examinations, into fewer dimensions by fitting the matrix to orthogonal axes. Although this methodology is frequently used in multivariate analyses, it has disadvantages when applied to experimental data. First, the identified principal components have poor generality; since the size and directions of the components are dependent on the particular data set, the components are valid only within the data set. Second, the method is sensitive to experimental noise and bias between sample groups. It cannot reflect the experimental design that is planned to manage the noise and bias; rather, it estimates the same weight and independence to all the samples in the matrix. Third, the resulting components are often difficult to interpret. To address these issues, several options were introduced to the methodology. First, the principal axes were identified using training data sets and shared across experiments. These training data reflect the design of experiments, and their preparation allows noise to be reduced and group bias to be removed. Second, the center of the rotation was determined in accordance with the experimental design. Third, the resulting components were scaled to unify their size unit.ResultsThe effects of these options were observed in microarray experiments, and showed an improvement in the separation of groups and robustness to noise. The range of scaled scores was unaffected by the number of items. Additionally, unknown samples were appropriately classified using pre-arranged axes. Furthermore, these axes well reflected the characteristics of groups in the experiments. As was observed, the scaling of the components and sharing of axes enabled comparisons of the components beyond experiments. The use of training data reduced the effects of noise and bias in the data, facilitating the physical interpretation of the principal axes.ConclusionsTogether, these introduced options result in improved generality and objectivity of the analytical results. The methodology has thus become more like a set of multiple regression analyses that find independent models that specify each of the axes.

Highlights

Principal component analysis is used to summarize matrix data, such as found in transcriptome, proteome or metabolome and medical examinations, into fewer dimensions by fitting the matrix to orthogonal axes
Improvement in separating repeating groups The effects of using training data and the focus on the positive genes of the test were investigated in mammary gland development data taken from Anderson et al (2007), with a view towards group separation [14]
Compared with the original method that found axes for the full data matrix X (Figure 1A), the separation of groups was improved when axes were found for selected genes (Figure 1B) or for the training data (Figure 1C)

Summary

Introduction

Principal component analysis is used to summarize matrix data, such as found in transcriptome, proteome or metabolome and medical examinations, into fewer dimensions by fitting the matrix to orthogonal axes. This methodology is frequently used in multivariate analyses, it has disadvantages when applied to experimental data. The principal axes were identified using training data sets and shared across experiments These training data reflect the design of experiments, and their preparation allows noise to be reduced and group bias to be removed. The unitary matrices U and V control the directions of the principal axes, while the diagonal matrix D records the singular values.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2015
Citations: 61	License type: cc-by

R Discovery Prime

R Discovery Prime

Principal component analysis for designed experiments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A method for adequate selection of training data sets to reconstruct seismic data using a convolutional U-Net
Jiho Park ... Soon Jee Seol
GEOPHYSICS | VOL. 86
Jiho Park, et. al.Jiho Park ... Soon Jee Seol
18 Aug 2021
GEOPHYSICS | VOL. 86

Method for Automatic Selection of Parameters in Normal Tissue Complication Probability Modeling
Damianos Christophides ... David Sebag-Montefiore
International Journal of Radiation Oncology*Biology*Physics | VOL. 101
Damianos Christophides, et. al.Damianos Christophides ... David Sebag-Montefiore
06 Mar 2018
International Journal of Radiation Oncology*Biology*Physics | VOL. 101

Identification of a DNA Methylation Predictive Signature of Overall Survival in Higher-Risk MDS Patients Treated with Azacitidine,
Tao Shi ... Kyle J Macbeth
Blood | VOL. 118
Tao Shi, et. al.Tao Shi ... Kyle J Macbeth
18 Nov 2011
Blood | VOL. 118

A Validated Model to Predict Postoperative Symptom Severity After Mandibular Third Molar Removal
Feng Qiao ... Jun Sun
Journal of Oral and Maxillofacial Surgery | VOL. 78
Feng Qiao, et. al.Feng Qiao ... Jun Sun
12 Feb 2020
Journal of Oral and Maxillofacial Surgery | VOL. 78

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Principal component analysis for designed experiments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics