Ontologies encode the knowledge of human experts in order to allow computers to automate common tasks in a domain. They are hierarchically organised and backed by computational logic which allows automated inferences of the implicit consequences of explicitly stated knowledge. ChEBI is a database and ontology of chemical entities of biological interest [1]. Within the ontology, chemical entities are classified based on shared structural features and also based on their roles and activities in biological systems. For example, the chemical class ‘aminopyridine’ is defined as ‘Compounds containing a pyridine skeleton substituted by one or more amine groups’, while an example of a role based class is ‘antiviral drug’, which groups together chemical entities that are used as antiviral drugs, regardless of their chemical structure. We have developed a novel semi-automated system for creating structure-based chemical class definitions. Our tool allows curators to draw and visually define shared structural features for classes of chemicals, which definitions are then used to automatically detect class membership across the full chemical database. The front end is based on an extended JChemPaint [2] and the Google Web Toolkit, and the back-end on a custom extension of the Chemistry Development Kit [3]. With this tool, it is possible to define chemical classes based on molecular skeletons, substitute groups, arbitrary parts including cycles of arbitrary length, formulae and overall properties, and these features can be combined using nested logical operators. Matching these definitions to candidate structures from the database is accomplished by means of an in-memory matching procedure, validated against the existing manually curated classification in ChEBI, allowing us to iteratively refine both the definitions of classes as well as to evolve the quality of the classification in ChEBI.
Read full abstract