Abstract

The goal of this work is to describe the advantages of the application of Conceptual Modeling (CM) in complex domains, such as genomics. Nowadays, the study and comprehension of the human genome is a major challenge due to its high level of complexity. The constant evolution in the genomic domain contributes to the generation of ever larger amounts of new data, which means that if we do not manage it correctly data quality could be compromised (i.e., problems related with heterogeneity and inconsistent data). In this paper, we propose the use of a Conceptual Schema of the Human Genome (CSHG), designed to understand and improve our ontological commitment to the domain and also extend (enrich) this schema with the integration of a novel concept: Haplotypes. Our focus is on improving the understanding of the relationship between genotype and phenotype, since new findings show that this question is more complex than was originally thought. Here we present the first steps in our data management approach with haplotypes (variations, frequencies and populations) and discuss the database evolution to support this data. Each new version in our conceptual schema (CS) introduces changes to the underlying database structure that has essential and practical implications for better understanding and managing the relevant information. A solution based on conceptual models gives a clear definition of the domain with direct implications in the medical field (Precision Medicine), in which Genomic Information Systems (GeIS) play a very important role.

Highlights

  • As the application of Generation Sequencing (NGS) technologies contributes to the generation of ever larger amounts of new data, to take full advantage of all this new knowledge we need to build structures to organize, process and use it in order to improve our understanding of the human genome

  • 2 Background: Understanding the Haplotype Concept -test case: Alcohol Sensitivity-We detected the importance of including haplotype treatment in our Conceptual Schema of the Human Genome (CSHG) on the genetic implications for the pathology of Alcohol Sensitivity, in which we did an intensive study of genes and variants that were associated with a predisposition to this disease [5,6]

  • We only found a sort of table schema specification in UCSC. dbSNP only shows an attribute associated with the concept of haplotypes in its schema, and for this reason is a very limited definition

Read more

Summary

Introduction

As the application of Generation Sequencing (NGS) technologies contributes to the generation of ever larger amounts of new data, to take full advantage of all this new knowledge we need to build structures to organize, process and use it in order to improve our understanding of the human genome. Pastor et al describes the Conceptual Schema of the Human Genome (CSHG) [4], [24] This model should be extended in two ways: 1) integrating treatment of haplotypes, 2) application of statistical models In this context, the goal of the present study, which is based on our previous work [48], is to extend the Conceptual Schema of the Human Genome (CSHG) by including the concepts of haplotypes and statistical models, improving the schema’s expressiveness.

Background
Related Work
High dispersion and data redundancy
Conceptual Modeling of Haplotypes
Conceptual model validation
X: Chromosome Y
Development of a haplotype database
Findings
Lessons Learned and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call