Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach

Masoud Arabfard,Mina Ohadi,Kaveh Kavousi,Vahid Rezaei Tabar,Ahmad Delbari

doi:10.1186/s12864-019-6140-0

Abstract

BackgroundMachine learning can effectively nominate novel genes for various research purposes in the laboratory. On a genome-wide scale, we implemented multiple databases and algorithms to predict and prioritize the human aging genes (PPHAGE).ResultsWe fused data from 11 databases, and used Naïve Bayes classifier and positive unlabeled learning (PUL) methods, NB, Spy, and Rocchio-SVM, to rank human genes in respect with their implication in aging. The PUL methods enabled us to identify a list of negative (non-aging) genes to use alongside the seed (known age-related) genes in the ranking process. Comparison of the PUL algorithms revealed that none of the methods for identifying a negative sample were advantageous over other methods, and their simultaneous use in a form of fusion was critical for obtaining optimal results (PPHAGE is publicly available at https://cbb.ut.ac.ir/pphage).ConclusionWe predict and prioritize over 3,000 candidate age-related genes in human, based on significant ranking scores. The identified candidate genes are associated with pathways, ontologies, and diseases that are linked to aging, such as cancer and diabetes. Our data offer a platform for future experimental research on the genetic and biological aspects of aging. Additionally, we demonstrate that fusion of PUL methods and data sources can be successfully used for aging and disease candidate gene prioritization.

Highlights

Machine learning can effectively nominate novel genes for various research purposes in the laboratory
We examined the existing methods of identifying human non-aging genes in the machine learning techniques, and made a binary classifier for predicting novel candidate genes, based on the positively and negatively learned genes
The three positive unlabeled learning (PUL) algorithms, Naïve Bayes (NB), Spy, and Rocchio-SVM, were used to evaluate the underlying data, and to compare them to the eight datasets introduced with respect to performance

Summary

Introduction

Machine learning can effectively nominate novel genes for various research purposes in the laboratory. Biologists apply computation, mathematics methods, and algorithms to develop machine learning methods of identifying novel candidate disease genes [3]. Most methods of predicting candidate genes employ various biological data, such as protein sequence, functional annotation, gene expression, protein-protein interaction networks, regulatory data and even orthogonal and conservation data, to identify similarities with respect to the principle of association based on similarity [5]. These methods are categorized as unsupervised, supervised, and semisupervised [6]. Supervised methods create a boundary between disease genes and non-disease genes, and utilize this boundary

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Nov 9, 2019
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

PONYTA: prioritization of phenotype-related genes from mouse KO events using PU learning on a biological network.
Jun Hyeong Kim ... Sun Kim
Bioinformatics (Oxford, England) | VOL. 40
Jun Hyeong Kim, et. al.Jun Hyeong Kim ... Sun Kim
21 Oct 2024
Bioinformatics (Oxford, England) | VOL. 40

Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genes
Jorge Paz-Ruza ... Bertha Guijarro-Berdiñas
Computers in Biology and Medicine | VOL. 180
Jorge Paz-Ruza, et. al.Jorge Paz-Ruza ... Bertha Guijarro-Berdiñas
12 Aug 2024
Computers in Biology and Medicine | VOL. 180

Improved human disease candidate gene prioritization using mouse phenotype
Jing Chen ... Huan Xu
BMC Bioinformatics | VOL. 8
Jing Chen, et. al.Jing Chen ... Huan Xu
16 Oct 2007
BMC Bioinformatics | VOL. 8

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.
Hyebin Song ... Emily C Hinds
Cell systems | VOL. 12
Hyebin Song, et. al.Hyebin Song ... Emily C Hinds
18 Nov 2020
Cell systems | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Genomics