Abstract

The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousands, of individual lines or accessions within a species. Due to limitations in current sequencing technology, the genotype values for some accessions cannot be clearly called. Additionally, allelic heterozygosity can be very high in some lines, causing genetic and sometimes phenotypic segregation in their descendants. Genetic and phenotypic segregation degrades the original accession’s specificity and makes it difficult to distinguish one accession from another. Therefore, it is vitally important to determine and validate HapMap accessions before one conducts a GWAS analysis. However, to the best of our knowledge, there are no prior methodologies or tools that can readily distinguish or validate multiple accessions in a HapMap population. We devised a bioinformatics approach to distinguish multiple HapMap accessions using only a minimum number of genetic markers. First, we assign each candidate marker with a distinguishing score (DS), which measures its capability in distinguishing accessions. The DS score prioritizes those markers with higher percentages of homozygous genotypes (allele combinations), as they can be stably passed on to offspring. Next, we apply the “set-partitioning” concept to select optimal markers by recursively partitioning accession sets. Subsequently, we build a hierarchical decision tree in which a specific path represents the selected markers and the homogenous genotypes that can be used to distinguish one accession from others in the HapMap population. Based on these algorithms, we developed a web tool named MAD-HiDTree (Multiple Accession Distinguishment-Hierarchical Decision Tree), designed to analyze a user-input genotype matrix and construct a hierarchical decision tree. Using genetic marker data extracted from the Medicago truncatula HapMap population, we successfully constructed hierarchical decision trees by which the original 262 M. truncatula accessions could be efficiently distinguished. PCR experiments verified our proposed method, confirming that MAD-HiDTree can be used for the identification of a specific accession. MAD-HiDTree was developed in C/C++ in Linux. Both the source code and test data are publicly available at https://bioinfo.noble.org/MAD-HiDTree/.

Highlights

  • A Haplotype Map (HapMap) project aims to develop a haplotype map of a genome of interest and describe the common patterns of genetic variations among individuals

  • We developed a user-friendly web-based pipeline to facilitate the creation of the genotype marker matrix file by extracting and filtering information from the HapMap variant calling file in a.vcf/.bcf file called Genotype Marker Extracting and Filtering (GMEF), based on the BCFtools command lines

  • Once the analysis is complete, MADHiDTree will display a pop-up window (Figure 4C), prompting the user to download three results files (Supplementary File 2): HiDTree.txt, MarkerList.txt, and SubsetList.txt, which record the information of the generated hierarchical decision tree, the selected marker index list and the final partitioned accession list for each output subset, respectively

Read more

Summary

Introduction

A HapMap project aims to develop a haplotype map of a genome of interest and describe the common patterns of genetic variations among individuals. This always requires sequencing and genotyping hundreds, even thousands, of individual lines/accessions. Sequence variation data usually are acquired by genome-wide sequence alignment (Gollery, 2005) and subsequent genotype calling (Nielsen et al, 2011). It has been reported that the genome-wide mean HET rate estimated by Single Nucleotide Polymorphism (SNP) database in a well-designed HapMap data may be as high as ∼0.2 (Li et al, 2007; Anderson et al, 2010)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.