Abstract
The realisation that vast amounts of pharmacological data for small molecules are continuously being reported in bibliographic sources has promoted in recent years the rise of initiatives aiming at collecting, organising, and storing these data together with chemical structures. Today, there are numerous databases that connect hundreds of thousands of small molecules to thousands of biological responses of their interaction with macromolecules. Some of these repositories, such as GLIDA, PDSP, BindingDB, IUPHARdb, PubChem, ChEMBL, and DrugBank, make all data available in the public domain. [1] In addition, some others, such as BioPrint, Integrity, Wombat, and GOSTAR, offer access to their data only through licensing from the respective commercial providers. [2] This wide diversity of sources does not facilitate direct access and interrogation of the entire contents covered by all of them. Integrating all these repositories into a single accessible resource is not trivial, mainly due to issues related with the use of different vocabularies and ontologies for the various domain entities which makes cross-referencing among multiple sources a challenging task. [3] But even if some degree of integration is accomplished, managing and updating such an integrated framework in an efficient manner may require significant human resources and be extremely time consuming and difficult to fully automate. [4] Managing chemical structures, for instance, involves taking into consideration a fair amount of detailed aspects such as salt formulation and isomerism (tautomerism, regioisomerism, and optical and geometrical isomerisms), and it has been reported that different molecular identifiers may actually lead to an essentially different number of unique chemical structures depending on the user criteria for defining uniqueness. [5] On the other hand, managing pharmacological data across databases is also complicated, as one may encounter different values for the same molecule – protein interaction obtained from different laboratories, from the same protein but different species, or from the same protein and species but different settings and conditions. [6] In parallel, there have been some recent initiatives to provide some conceptual meaning to the connections established between objects from different domains, so the links are stored in such a way that become more understandable to computers. This is the main goal of applying
Highlights
The realisation that vast amounts of pharmacological data for small molecules are continuously being reported in bibliographic sources has promoted in recent years the rise of initiatives aiming at collecting, organising, and storing these data together with chemical structures
For instance, involves taking into consideration a fair amount of detailed aspects such as salt formulation and isomerism, and it has been reported that different molecular identifiers may lead to an essentially different number of unique chemical structures depending on the user criteria for defining uniqueness.[5]
Of mention are Bio2RDF,[9] that codifies the contents of different public biological databases into a resource description framework (RDF), Linking Open Drug Data (LODD),[10] that makes a similar task but focussed mainly on drug data, and Chem2Bio2RDF,[11] that integrates small molecule and drug information with protein targets, genes, and pathways, and allows cross-source linking with LODD and Bio2RDF
Summary
The realisation that vast amounts of pharmacological data for small molecules are continuously being reported in bibliographic sources has promoted in recent years the rise of initiatives aiming at collecting, organising, and storing these data together with chemical structures. Open PHACTS is a recently funded European project that applies semantic web standards and technologies to create an integrated open pharmacological space (OPS) aiming at facilitating open innovation in drug discovery research.[12] With this semantic approach, Open PHACTS aspires to solve some of the main bottlenecks of current data access and knowledge generation in drug discovery, namely, access to multiple disparate heterogenic information sources, lack of standards and common identifiers for domain entities, and ability to interrogate the system with complex research questions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.