The realisation that vast amounts of pharmacological data for small molecules are continuously being reported in bibliographic sources has promoted in recent years the rise of initiatives aiming at collecting, organising, and storing these data together with chemical structures. Today, there are numerous databases that connect hundreds of thousands of small molecules to thousands of biological responses of their interaction with macromolecules. Some of these repositories, such as GLIDA, PDSP, BindingDB, IUPHARdb, PubChem, ChEMBL, and DrugBank, make all data available in the public domain. [1] In addition, some others, such as BioPrint, Integrity, Wombat, and GOSTAR, offer access to their data only through licensing from the respective commercial providers. [2] This wide diversity of sources does not facilitate direct access and interrogation of the entire contents covered by all of them. Integrating all these repositories into a single accessible resource is not trivial, mainly due to issues related with the use of different vocabularies and ontologies for the various domain entities which makes cross-referencing among multiple sources a challenging task. [3] But even if some degree of integration is accomplished, managing and updating such an integrated framework in an efficient manner may require significant human resources and be extremely time consuming and difficult to fully automate. [4] Managing chemical structures, for instance, involves taking into consideration a fair amount of detailed aspects such as salt formulation and isomerism (tautomerism, regioisomerism, and optical and geometrical isomerisms), and it has been reported that different molecular identifiers may actually lead to an essentially different number of unique chemical structures depending on the user criteria for defining uniqueness. [5] On the other hand, managing pharmacological data across databases is also complicated, as one may encounter different values for the same molecule – protein interaction obtained from different laboratories, from the same protein but different species, or from the same protein and species but different settings and conditions. [6] In parallel, there have been some recent initiatives to provide some conceptual meaning to the connections established between objects from different domains, so the links are stored in such a way that become more understandable to computers. This is the main goal of applying
Read full abstract