Abstract

The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated in Genome Wide Association Studies (GWAS). These data can be used to facilitate novel scientific discovery and to reduce cost and time for exploratory research. However, idiosyncrasies in variable names become a major barrier for reusing these data. We studied the problem of formalizing the phenotype variable descriptions using Clinical Element Models (CEM). Direct mapping of 379 phenotype names to existing CEM yielded a low rate of exact matches (N=25). However, the flexible and expressive underlying information models of CEM provided a robust means of representing 115 phenotype variable descriptions, indicating that CEMs can be successfully applied to standardize a large portion of the clinical variables contained in dbGaP.

Highlights

  • With the advancements in genome-wide association studies (GWAS), the number of public genotypic and phenotypic data repositories, such as the database of Genotypes and Phenotypes, has significantly increased [1,2]

  • In Phase II, we modeled 200 original phenotype variables selected from two phenotype data dictionaries in database of Genotypes and Phenotypes (dbGaP) using the Clinical Element Models (CEM) template models

  • We investigated whether CEM template models could be applied to formalize the dbGaP variables

Read more

Summary

Introduction

With the advancements in genome-wide association studies (GWAS), the number of public genotypic and phenotypic data repositories, such as the database of Genotypes and Phenotypes (dbGaP), has significantly increased [1,2]. Phenotype variables are often named without a specific naming convention, or are often labeled with abbreviated codes that do not convey clear meaning. Many of these variables are accompanied by descriptions that help users understand what data the variable intends to represent. Challenges in standardizing phenotype variables in dbGaP dbGaP contains various types of data generated in many GWAS studies, such as phenotypes, genotypes, and pedigree information of subjects, as well as specifics on samples, measurements and experiments. Challenges associated with non-standardized phenotype variables generated in different research institutions are widely recognized [6]. Standardization of the phenotype variables collected from different institutions/studies is a common challenge for these initiatives [7]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call