Abstract

We have entered the era of individual genomic sequencing, and can already see exponential progress in the field. It is of utmost importance to exclude false-positive variants from reported datasets. However, because of the nature of the used algorithms, this task has not been optimized to the required level of precision. This study presents a unique strategy for identifying SNPs, called COIN-VGH, that largely minimizes the presence of false-positives in the generated data. The algorithm was developed using the X-chromosome-specific regions from the previously sequenced genomes of Craig Venter and James Watson. The algorithm is based on the concept that a nucleotide can be individualized if it is analyzed in the context of its surrounding genomic sequence. COIN-VGH consists of defining the most comprehensive set of nucleotide strings of a defined length that map with 100% identity to a unique position within the human reference genome (HRG). Such set is used to retrieve sequence reads from a query genome (QG), allowing the production of a genomic landscape that represents a draft HRG-guided assembly of the QG. This landscape is analyzed for specific signatures that indicate the presence of SNPs. The fidelity of the variation signature was assessed using simulation experiments by virtually altering the HRG at defined positions. Finally, the signature regions identified in the HRG and in the QG reads are aligned and the precise nature and position of the corresponding SNPs are detected. The advantages of COIN-VGH over previous algorithms are discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.