Abstract
Sequencing data quality control can significantly prevent low-quality data from impacting downstream applications in bioinformatics. The enormous growth of biological sequencing data in recent years introduces new challenges to the efficiency of quality control processes and motivates the need for fast implementations on modern compute systems. The powerful next-generation heterogeneous Sunway platform holds significant potential for addressing this challenge. However, there are currently no dedicated quality control applications that can fully utilize its computational power. To bridge this gap, we introduce SWQC, a novel quality control application specifically designed for the Sunway platform. We present an efficient distributed FASTQ I/O framework for Sunway-based workstations and supercomputers to take advantage of fast SSDs and the parallel file system. In order to support both process-level and thread-level (CPE-level) parallelism to leverage the computational power, we refactor and optimize all standard quality control modules for the heterogeneous Sunway architecture. When using a single node, SWQC achieves speedups between 2 and 40 over highly optimized quality control applications executed on a high-end 48-core AMD server. Additionally, when using 16 nodes, SWQC achieves parallel efficiencies of 70% (for reading and writing a single file) and 95% (for reading one file and writing split files) compared to a single node. Overall, SWQC is able to perform quality control operations for a 140GB FASTQ file within only 70 s using a single Sunway node. It is publicly available at https://github.com/RabbitBio/SWQC.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.