Abstract
Haplotype-resolved genome assemblies serve as vital resources in various research domains, including genomics, medicine, and pangenomics. Algorithms employing Hi-C data to generate haplotype-resolved assemblies are particularly advantageous due to its ready availability. Existing methods primarily depend on mapping quality to filter out uninformative Hi-C alignments which may be susceptible to sequencing errors. Setting a high mapping quality threshold filters out numerous informative Hi-C alignments, whereas a low mapping quality threshold compromises the accuracy of Hi-C alignments. Maintaining high accuracy while retaining a maximum number of Hi-C alignments can be challenging. In our experiments, heterozygous variations play an important role in filtering uninformative Hi-C alignments. Here, we introduce Diphase, a novel phasing tool that harnesses heterozygous variations to accurately identify the informative Hi-C alignments for phasing and to extend primary/alternate assemblies. Diphase leverages mapping quality and heterozygous variations to filter uninformative Hi-C alignments, thereby enhancing the accuracy of phasing and the detection of switches. To validate its performance, we conducted a comparative analysis of Diphase, FALCON-Phase, and GFAse on various human datasets. The results demonstrate that Diphase achieves a longer phased block N50 and exhibits higher phasing accuracy while maintaining a lower hamming error rate. The source code of Diphase is available at https://github.com/zhangjuncsu/Diphase. Supplementary data are available at Bioinformatics online.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.