Abstract

Sequence data are deposited in the form of unphased genotypes and it is not possible to directly identify the location of a particular allele on a specific parental chromosome or haplotype. This study employed nonlinear time series modeling approaches to analyze the haplotype sequences obtained from the NGS sequencing method. To evaluate the chaotic behavior of haplotypes, we analyzed their whole sequences, as well as several subsequences from distinct haplotypes, in terms of the SNP distribution on their chromosomes. This analysis utilized chaos game representation (CGR) followed by the application of two different scaling methods. It was found that chaotic behavior clearly exists in most haplotype subsequences. For testing the applicability of the proposed model, the present research determined the alleles in gap positions and positions with low coverage by using chromosome subsequences in which 10% of each subsequence’s alleles are replaced by gaps. After conversion of the subsequences’ CGR into the coordinate series, a Local Projection (LP) method predicted the measure of ambiguous positions in the coordinate series. It was discovered that the average reconstruction rate for all input data is more than 97%, demonstrating that applying this knowledge can effectively improve the reconstruction rate of given haplotypes.

Highlights

  • More than 99% of human genome is identical among individuals as well as different ethnic groups

  • In order to identify genes involved in genetic diseases, massive amounts of SNP and haplotype data were utilized by genome-wide association studies (GWASs) to detect highly statistically significant correlations between SNPs on the genetic materials and various numbers of phenotypes[12]

  • The current study investigated the chaotic behavior of haplotype sequences by considering the distribution of SNPs and mapping them with the chaos game representation (CGR) algorithm

Read more

Summary

Introduction

More than 99% of human genome is identical among individuals as well as different ethnic groups. Less than 1% of genetic differences are responsible for all of the observed variations among people all over the world[1] Specifying these differences in genetic material and evaluating the distribution on the DNA sequences of different human populations may have important implications in solving various problems in biology and medicine. Both SNPs and haplotypes provide valuable information for assessing genetic variations in a systematic manner Different research fields, such as disease susceptibility, drug design, and genome-wide association studies (GWASs)[5], can greatly benefit from this data. The distribution of SNPs across genome elements has been investigated by a multitude of studies These have illustrated that SNPs tend to be clustered across the genome elements in a deterministic manner in which the position of the each mutation is usually affected by its neighbors and the sequences of SNPs are often highly correlated with each other[6,7,8]. Chaotic view point has been applied to evaluate biological signals such as electroencephalogram (EEG) signals[33,34,35]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call