Abstract
The highly challenging hexaploid wheat (Triticum aestivum) genome is becoming ever more accessible due to the continued development of multiple reference genomes, a factor which aids in the plight to better understand variation in important traits. Although the process of variant calling is relatively straightforward, selection of the best combination of the computational tools for read alignment and variant calling stages of the analysis and efficient filtering of the false variant calls are not always easy tasks. Previous studies have analyzed the impact of methods on the quality metrics in diploid organisms. Given that variant identification in wheat largely relies on accurate mining of exome data, there is a critical need to better understand how different methods affect the analysis of whole exome sequencing (WES) data in polyploid species. This study aims to address this by performing whole exome sequencing of 48 wheat cultivars and assessing the performance of various variant calling pipelines at their suggested settings. The results show that all the pipelines require filtering to eliminate false-positive calls. The high consensus among the reference SNPs called by the best-performing pipelines suggests that filtering provides accurate and reproducible results. This study also provides detailed comparisons for high sensitivity and precision at individual and population levels for the raw and filtered SNP calls.
Highlights
As a result of its economic viability and availability of high-quality data, whole exome sequencing (WES) is rapidly becoming a standard approach for detecting gene variants, where the major challenge of accurate and reproducible variant detection has shifted toward improving computational pipelines
All t48 WES datasets were aligned to the IWGSC wheat reference genome v1.0 separately using a range of aligners outlined below
This study provided an assessment of 24 different variant calling pipelines based on the whole exome sequencing data of 48 elite wheat cultivars
Summary
Advances in next-generation sequencing technologies have paved the way for improved genomic studies, providing enormous amounts of high-quality data in a fast and affordable manner [1]. Whole exome sequencing (WES) is one such advance, which focuses on capturing only exonic regions of the genome [2], and since its development, has been widely used to identify and understand structural variations of many disease-causing mutations [3,4]. As a result of its economic viability and availability of high-quality data, WES is rapidly becoming a standard approach for detecting gene variants, where the major challenge of accurate and reproducible variant detection has shifted toward improving computational pipelines
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have