Abstract

BackgroundHigh throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report. These pipelines often suffer from a number of shortcomings: they lack robustness and have many components written in multiple languages, each with a variety of resource requirements. Pipeline components must be linked together with a workflow system to achieve the processing of FASTQ files through to a VCF file of variants. Crafting these pipelines requires considerable bioinformatics and IT skills beyond the reach of many clinical laboratories.ResultsHere we present Canary, a single program that can be run on a laptop, which takes FASTQ files from amplicon assays through to an annotated VCF file ready for clinical analysis. Canary can be installed and run with a single command using Docker containerization or run as a single JAR file on a wide range of platforms. Although it is a single utility, Canary performs all the functions present in more complex and unwieldy pipelines. All variants identified by Canary are 3′ shifted and represented in their most parsimonious form to provide a consistent nomenclature, irrespective of sequencing variation. Further, proximate in-phase variants are represented as a single HGVS ‘delins’ variant. This allows for correct nomenclature and consequences to be ascribed to complex multi-nucleotide polymorphisms (MNPs), which are otherwise difficult to represent and interpret. Variants can also be annotated with hundreds of attributes sourced from MyVariant.info to give up to date details on pathogenicity, population statistics and in-silico predictors.ConclusionsCanary has been used at the Peter MacCallum Cancer Centre in Melbourne for the last 2 years for the processing of clinical sequencing data. By encapsulating clinical features in a single, easily installed executable, Canary makes sequencing more accessible to all pathology laboratories.Canary is available for download as source or a Docker image at https://github.com/PapenfussLab/Canary under a GPL-3.0 License.

Highlights

  • High throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report

  • We introduce Canary, a stand-alone Java utility that performs the function of multi-tool pipelines and can generate annotated Variant Call Format (VCF) files directly from zipped FASTQ files generated from amplicon assays

  • Canary simplifies the pipeline steps required with a single command to go from zipped FASTQ files to an annotated VCF file suitable for clinical curation

Read more

Summary

Results

To assess the performance of Canary in both germline and somatic contexts, three experiments were performed with well-studied samples containing known variants. The samples were run on an Illumina MiSeq sequencer and the reads converted to paired-end FASTQ files These files were run against the three pipelines described above except that the Canary pipeline called variants down to a variant allele frequency of 1%. Typical performance of Canary is between 7 and 10 min when processing a full Illumina MiSeq run of 48 samples (22 patient samples in replicate and 4 controls), performing alignment, variant calling and annotation, on a computing cluster. These times are for an in-house myeloid assay of 216 amplicons covering key exons of 26 genes with a total panel size of 29.9 kilobases. The average read pairs per sample were 375,522 and the average cache hit ratio was 20.5%

Conclusions
Background
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.