Abstract 5143: From sequencing data to mutation spectra: a high throughput analysis pipeline.

David R Jones ,Susanna L Cooke,Jorge Zamora,Manasa Ramakrishna,Serena Nik-Zainal,Helen Davies,Elli Papaemmanuil,Serge Dronov,Jonathan Hinton,Michael R Stratton,Jon W Teague,Andrew Menzies,Lucy Stebbings,John Marshall,Alagu Jayakumar,Catherine Leroy,Keiran Raine ,Ultan Mcdermott ,Adam Butler ,Peter J Campbell

doi:10.1158/1538-7445.am2013-5143

Abstract

Abstract Advances in massively parallel sequencing technology have revolutionized the way we characterise cancer genomes and provided significant new insights to our understanding of the mechanisms that underpin oncogenesis. A diverse range of mutations types including single base-pair changes, insertions, deletions, copy number alterations and larger structural variations are common in cancer genomes. To rapidly and accurately screen next generation sequencing data for these somatic mutations in cancer, the Cancer Genome Project (CGP) has developed a high throughput analysis pipeline utilising a suite of analysis software developed by the group. Built around a compute farm of ∼2,000 nodes and using a Lustre filesystem, raw data files (BAM etc.), analysis results files and version information are efficiently stored and tracked in our archive/storage system, FileTrk. Lane data is aligned using Burrows-Wheeler Aligner (BWA) and web interfaces have been developed to allow scientific staff to rapidly QC aligned lanes. Once QC'd and desired coverage is reached, lanes are merged into a single sample BAM file and the sample is then ready for analysis. In house algorithms are used to detect point mutations (CaVEMan), structural variation breakpoints (Brass) and copy number changes (ASCAT and PICNIC), whilst Pindel is used to detect small insertions/deletions. Post-processing filters then remove false positives and the results are uploaded into a database. Mutations are annotated to the protein and RNA levels using standard nomenclature (Vagrent, in-house software). Downstream analysis software has been developed (CANDI, in-house software) which produces a range of plots to aid visualisation of mutation context and mutation spectra patterns in related cancer samples. Current IT development is focussed on converting the pipeline to produce and store VCF output, incorporate further downstream analysis software and automate data export to COSMIC and the ICGC data portal. Citation Format: David Jones, Adam P. Butler, Jon W. Teague, Keiran M. Raine, Andrew Menzies, John Marshall, Jonathan Hinton, Serge Dronov, Lucy Stebbings, Alagu Jayakumar, Catherine Leroy, Jorge Zamora, Manasa Ramakrishna, Elli Papaemmanuil, Helen Davies, Susanna L. Cooke, Serena Nik-Zainal, Ultan McDermott, Michael R. Stratton, Peter Campbell. From sequencing data to mutation spectra: a high throughput analysis pipeline. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 5143. doi:10.1158/1538-7445.AM2013-5143

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 5143: From sequencing data to mutation spectra: a high throughput analysis pipeline.

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Similar Papers

Abstract 3967: The Cancer Genome Project high throughput analysis pipeline
Adam P Butler ... Ultan Mcdermott
Cancer Research | VOL. 72
Adam P Butler, et. al.Adam P Butler ... Ultan Mcdermott
15 Apr 2012
Abstract 3967: The Cancer Genome Project high throughput analysis pipeline
Adam P Butler ... Ultan Mcdermott

Abstract 93: COSMIC: The catalogue of somatic mutations in cancer receives full genome variant annotations
Simon A Forbes ... Sally Bamford
Cancer Research | VOL. 70
Simon A Forbes, et. al.Simon A Forbes ... Sally Bamford
15 Apr 2010
Cancer Research | VOL. 70

Abstract SY25-01: Analysis of next-generation sequencing data for cancer genomes: challenges and pitfalls
Jianmin Wang ... Jinghui Zhang
Cancer Research | VOL. 72
Jianmin Wang, et. al.Jianmin Wang ... Jinghui Zhang
15 Apr 2012
Abstract SY25-01: Analysis of next-generation sequencing data for cancer genomes: challenges and pitfalls
Jianmin Wang ... Jinghui Zhang

Abstract 5142: COSMIC: Exploring the world's knowledge of somatic mutations in cancer.
Kenric Leung ... Charlotte Cole
Cancer Research | VOL. 73
Kenric Leung, et. al.Kenric Leung ... Charlotte Cole
15 Apr 2013
Abstract 5142: COSMIC: Exploring the world's knowledge of somatic mutations in cancer.
Kenric Leung ... Charlotte Cole

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 5143: From sequencing data to mutation spectra: a high throughput analysis pipeline.

Abstract

Talk to us

Similar Papers

More From: Cancer Research