Haplotype Classification Using Copy Number Variation and Principal Components Analysis

Kevin Blighe

doi:10.2174/1875036201307010019

Abstract

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.

Highlights

Principal components analysis and other multivariate tools are used to analyze large volumes of data in order to tease out the differences/relationships between the logical entities being analyzed [1]
On the Affymetrix single nucleotide polymorphisms (SNPs) 6.0, 762,463 markers target known genes that are listed in the RefSeq gene database
The data generated through principal components analysis (PCA) was channelled through the haplotype-tagging copy number variants (htCNVs) pipeline, which was capable of reducing them further to 4,594

Summary

Introduction

Principal components analysis and other multivariate tools are used to analyze large volumes of data in order to tease out the differences/relationships between the logical entities being analyzed (for example, a data-set consisting of a large number of samples, each with their own data points/varia-bles) [1] It extracts the fundamental structure of the data without the need to build any model to represent it [2]. Examples include craniofacial recognition [5], analysis of water quality[3], and to derive a set of highly confident genes [6] or single nucleotide polymorphisms (SNPs) [7, 8] for classification purposes It has been used in subject areas such as climatology, geology, meteorology, psychology, quality control [4], forensics and population genetics ( in relation to SNPs), medical genetics [2], and bacteriology [9]. Du [11] successfully adapted and applied PCA to protein data in the form of Amino Acid PCA (AAPCA), where the aim was to classify proteins into structural classes; Li [12] combined

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Open Bioinformatics Journal	Publication Date: Nov 29, 2013
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Haplotype Classification Using Copy Number Variation and Principal Components Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Open Bioinformatics Journal

Lead the way for us

Similar Papers

Author response: Comprehensive and unbiased multiparameter high-throughput screening by compaRe finds effective and subtle drug responses in AML models
Morteza Chalabi Hajkarim ... Krister Wennerberg
-
Morteza Chalabi Hajkarim, et. al.Morteza Chalabi Hajkarim ... Krister Wennerberg
25 Jan 2022
25 Jan 2022

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Data Mining Approach for Determining Student Attention Pattern
Sujan Poudyal ... M Jean Mohammadi-Aragh
-
Sujan Poudyal, et. al.Sujan Poudyal ... M Jean Mohammadi-Aragh
21 Oct 2020
21 Oct 2020

A Primer on Machine Learning.
Audrene S Edwards ... Tun Jie
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Tun Jie
18 Aug 2020
Transplantation | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Haplotype Classification Using Copy Number Variation and Principal Components Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Open Bioinformatics Journal