Abstract

Increasing age is a risk factor for many diseases; therefore developing pharmacological interventions that slow down ageing and consequently postpone the onset of many age-related diseases is highly desirable. In this work we analyse data from the DrugAge database, which contains chemical compounds and their effect on the lifespan of model organisms. Predictive models were built using the machine learning method random forests to predict whether or not a chemical compound will increase Caenorhabditis elegans’ lifespan, using as features Gene Ontology (GO) terms annotated for proteins targeted by the compounds and chemical descriptors calculated from each compound's chemical structure. The model with the best predictive accuracy used both biological and chemical features, achieving a prediction accuracy of 80%. The top 20 most important GO terms include those related to mitochondrial processes, to enzymatic and immunological processes, and terms related to metabolic and transport processes. We applied our best model to predict compounds which are more likely to increase C. elegans’ lifespan in the DGIdb database, where the effect of the compounds on an organism's lifespan is unknown. The top hit compounds can be broadly divided into four groups: compounds affecting mitochondria, compounds for cancer treatment, anti-inflammatories, and compounds for gonadotropin-releasing hormone therapies.

Highlights

  • Old age is the greatest risk factor for many diseases, including various types of cancer, inflammatory and neurodegenerative diseases

  • The random forest builds a classification model to predict whether or not a chemical compound will increase the lifespan of C. elegans, based on predictive features describing that compound

  • We use the random forest method as the classification algorithm to analyse this dataset. This type of method was chosen because it is popular in bioinformatics [21,22], it is robust to overfitting in datasets where the number of features is much larger than the number of instances [22,23], it is relatively simple to understand and to use, and in contrast to other state-of-the-art classification methods like support vector machines, random forests produce interpretable results based on a variable importance measure, an interpretation mechanism exploited in this paper

Read more

Summary

Introduction

Old age is the greatest risk factor for many diseases, including various types of cancer, inflammatory and neurodegenerative diseases. From a whole body system’s point of view, this traditional one-disease-at-a-time approach focuses on the downstream diseases, rather than considering the underlying mechanisms of age-related functional decline. This approach has limited effectiveness at present and is likely to be less effective in the future, because of an increasingly larger elderly population suffering from multiple age-related diseases. Promising research on pharmacological interventions on the ageing process is underway at the National Institute of Aging’s Intervention Testing Program (ITP), which consists of administering drugs or chemical compounds to mice under carefully controlled conditions [4,5]. According to the GenAge database [9], C. elegans is the animal model with by far the most known ageingrelated genes (838 at the time of writing)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.