Abstract

Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for batch effects and provides allele-specific estimates of copy number. This paper illustrates a workflow for the estimation of allele-specific copy number and integration of the marker-level estimates with complimentary Bioconductor software for inferring regions of copy number gain or loss. All analyses are performed in the statistical environment R.

Highlights

  • Duplications and deletions spanning kilobases of the genome contribute to a substantial proportion of the genetic variation between individuals

  • We have applied the crlmm software to the HapMap phase 3 data, illustrating the steps of preprocessing, the genotyping of polymorphic markers, and the estimation of allele-specific copy number

  • We organize the normalized intensities, statistical summaries from the genotyping and copy number estimation steps, and meta-data on the features and samples in a single container. This container extends the eSet class defined in Biobase, with additional slots to accommodate batch-specific statistical summaries relevant for copy number analyses

Read more

Summary

Introduction

Duplications and deletions spanning kilobases of the genome contribute to a substantial proportion of the genetic variation between individuals. Current estimates regarding the frequency and size of segmental duplications and deletions in the human genome are largely based on high-throughput arrays that quantitate copy number on a genomic scale. Two such technologies are array comparative genomic hybridization (aCGH) and genotyping platforms such as the Affymetrix oligonucleotide arrays and the Illumina BeadArrays. This paper describes software for the first of a two-stage approach for identifying CNV in high-throughput genotyping arrays.

Preprocessing and genotyping
Locus-level copy number estimation
Downstream tools
Discussion
Session information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call