Abstract

Motivation and Objectives Within a living organism, genome and proteome variations may influence many molecular interactions and biochemical pathways, leading to deleterious effects in the proper activity of cells, tissues, and organs; ultimately, this may be the cause of many syndromes and diseases. It is now well known that tumors may arise as a result of a series of DNA sequence abnormalities and mutations. It is then not surprising that there is a vast amount of information available in the scientific literature and that a lot of information systems devoted to the management of related data exist. Among these, of particular interest are the many Locus Specific Data Bases (LSDB) and the COSMIC (Catalogue of Somatic Mutations in Cancer) database (Forbes et al., 2011). Such data, however, are not yet sufficiently integrated with other molecular, biomedical, and clinical databases. New efforts are therefore needed in this direction. Data retrieval, search and integration solutions in bioinformatics are increasingly making use of a set of standards and technologies which are the basis of the Semantic Web (Berners-Lee et al., 2001) framework. This framework is intended to evolve the web into a distributed knowledge-base and a first step in this evolution is the generation of a Web of Data (Bizer et al., 2009). In this view, we can see Linked Data as an approach to data integration that employs ontologies, terminologies, Uniform Resource Identifiers (URIs), and the Resource Description Framework (RDF) to connect pieces of data, information and knowledge on the Semantic Web (Belleau et al., 2008). In particular, RDF describes semantic rich information on the web through a composition of simple triples (predicates), such as (‘Subject’, ‘Property’, ‘Object’), that link entities through relations which are expressed by using ontologies, and are defined by using URIs. See the RDF reference site: http://www.w3.org/RDF/, last accessed on October 3, 2012). A relevant contribution to this vision comes from the conversion of data stored in relational databases (RDB) into RDF. There is a vast amount of information on human variation in the literature and several mutation and variation databases, but, to our knowledge, this kind of information is still scarce in the Web of Data. Various motivations can be depicted for using Semantic Web technologies and publishing Linked Data life sciences datasets; this allows to improve data and information integration, share ability of openly accessible data through standard and programmatic interfaces, semantic normalization, data discoverability and query federation from distributed sources. A first work carried out by our group led to the implementation of an RDF version (Zappa et al., 2012) of the IARC TP53 Somatic Mutation database (IARCDB) (Petitjean et al., 2007). Here, we present the initial development of an RDF version of the COSMIC (Catalogue of Somatic Mutations in Cancer) database by means of Semantic Web technologies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call