Abstract
Genetic variations play a crucial role in differential phenotypic outcomes. Given the complexity in establishing this correlation and the enormous data available today, it is imperative to design machine-readable, efficient methods to store, label, search and analyze this data. A semantic approach, FROG: “FingeRprinting Ontology of Genomic variations” is implemented to label variation data, based on its location, function and interactions. FROG has six levels to describe the variation annotation, namely, chromosome, DNA, RNA, protein, variations and interactions. Each level is a conceptual aggregation of logically connected attributes each of which comprises of various properties for the variant. For example, in chromosome level, one of the attributes is location of variation and which has two properties, allosomes or autosomes. Another attribute is variation kind which has four properties, namely, indel, deletion, insertion, substitution. Likewise, there are 48 attributes and 278 properties to capture the variation annotation across six levels. Each property is then assigned a bit score which in turn leads to generation of a binary fingerprint based on the combination of these properties (mostly taken from existing variation ontologies). FROG is a novel and unique method designed for the purpose of labeling the entire variation data generated till date for efficient storage, search and analysis. A web-based platform is designed as a test case for users to navigate sample datasets and generate fingerprints. The platform is available at http://ab-openlab.csir.res.in/frog.
Highlights
Genomic variations have been studied extensively to understand their role in disease association and drug responses
There are many tools that predict the outcome of structural variations at functional level like SIFT [9], PolyPhen-2 [10], PHD-SNP [11] etc, that have been developed to facilitate understanding the role of genomic variation from context of potential phenotypic impact
The concept of fingerprinting in FROG is designed based on general understanding of functional impact of any variation
Summary
Genomic variations have been studied extensively to understand their role in disease association and drug responses. There are many repositories that are designed with the objective of capturing data related to human diseases like the OMIM with information on ~14,000 genes [6] and GAD with over 130,000 records on human genetic association studies of complex diseases and disorders [7]. In this context, global efforts in form of centralized archives and platforms have been developed to capture genotype-phenotype interaction studies like the dbGaP [5] and GWAS Central [4]. There is clearly a need to develop systems to take advantage of this multidimensional big data in establishing robust genotype to phenotype correlations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.