Abstract

BackgroundAluScan combines inter-Alu PCR using multiple Alu-based primers with opposite orientations and next-generation sequencing to capture a huge number of Alu-proximal genomic sequences for investigation. Its requirement of only sub-microgram quantities of DNA facilitates the examination of large numbers of samples. However, the special features of AluScan data rendered difficult the calling of copy number variation (CNV) directly using the calling algorithms designed for whole genome sequencing (WGS) or exome sequencing.ResultsIn this study, an AluScanCNV package has been assembled for efficient CNV calling from AluScan sequencing data employing a Geary-Hinkley transformation (GHT) of read-depth ratios between either paired test-control samples, or between test samples and a reference template constructed from reference samples, to call the localized CNVs, followed by use of a GISTIC-like algorithm to identify recurrent CNVs and circular binary segmentation (CBS) to reveal large extended CNVs. To evaluate the utility of CNVs called from AluScan data, the AluScans from 23 non-cancer and 38 cancer genomes were analyzed in this study. The glioma samples analyzed yielded the familiar extended copy-number losses on chromosomes 1p and 9. Also, the recurrent somatic CNVs identified from liver cancer samples were similar to those reported for liver cancer WGS with respect to a striking enrichment of copy-number gains in chromosomes 1q and 8q. When localized or recurrent CNV-features capable of distinguishing between liver and non-liver cancer samples were selected by correlation-based machine learning, a highly accurate separation of the liver and non-liver cancer classes was attained.ConclusionsThe results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences. Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers. Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.Electronic supplementary materialThe online version of this article (doi:10.1186/s13336-014-0015-z) contains supplementary material, which is available to authorized users.

Highlights

  • The use of microarray platforms to perform copy number variation (CNV) calling is a valuable technique in genomic analysis

  • Both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers

  • In Additional file 4: Figure S1, where a mismatch was introduced such that the AluScan for the test sample was conducted using only three Alubased primers, whereas the reference-sample AluScans were carried out using four Alu-based primers, the deviation of the t-distribution from a normal curve was pronounced without GC normalization, but substantially improved with GC normalization, indicating that GC normalization enhanced the robustness of Geary-Hinkley transformation (GHT)-based CNV calling

Read more

Summary

Introduction

The use of microarray platforms to perform copy number variation (CNV) calling is a valuable technique in genomic analysis. As a method for genome-wide capture of the sequences amplified by inter-Alu PCR using multiple Alu-based primers with opposite ‘head type’ and ‘tail type’ orientations for next-generation sequencing, AluScan is expeditious in both experimental and informatics analysis, and requires less DNA compared to WGS or exome sequencing. While exome sequencing usually involves basically the same set of fixed target regions in every experiment, such that CNV calling on an unpaired sample can be performed without any control [7], the inter-Alu sequences analyzed by AluScan depend on the Alu-based PCR primers employed. The special features of AluScan data rendered difficult the calling of copy number variation (CNV) directly using the calling algorithms designed for whole genome sequencing (WGS) or exome sequencing

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call