Abstract

AbstractBackgroundWith the advancement of whole‐genome sequencing (WGS) technology, massively parallel sequencing (MPS) remains the mainstream due to its accuracy, low cost, and high throughput. The development of the analytical pipeline corresponding to MPS has always been of great importance. Increasingly large population genomics studies, as a specific type of big data research, pose new challenges for analysis solutions.ResultsHere, we introduce ZBOLT, a comprehensive analysis system that incorporates both software and hardware advancements, making it an appropriate choice for large‐scale population genomic studies that require extensive data processing. In this study, we first evaluate ZBOLT's calling accuracy using the Genome in a Bottle (GIAB) benchmark dataset. Then we apply ZBOLT to a large‐scale population genomics study with 5,616 high sequencing depth samples totaling 1.16Pbp (base pair). As the results show, ZBOLT demonstrates exceptional efficiency and low energy consumption, processing 100Tbp per day and using 1kWh per 100Gbp sequenced sample.ConclusionThis research serves as a valuable reference for analyzing sequencing data from large population cohorts and underscores the significant potential of ZBOLT in large‐scale population genomics studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call