Predicting genome-wide redundancy using machine learning

Huang-Wen Chen,Sunayan Bandyopadhyay,Dennis E Shasha,Kenneth D Birnbaum

doi:10.1186/1471-2148-10-357

Abstract

BackgroundGene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here.ResultsMachine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods.ConclusionsMachine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

Highlights

Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses
We develop tools to improve the analysis of genetic redundancy by (1) creating a database of comparative information on gene pairs based on sequence and expression characteristics, and, (2) predicting genetic redundancy genome wide using machine learning trained with known cases of genetic redundancy
There is a basis for asking whether combinations of gene pair attributes could be used to improve the prediction of genetic redundancy

Summary

Introduction

Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. In the model plant Arabidopsis thaliana, about 80% of genes have a paralog in the genome, with many individual cases of redundancy among paralogs [2,3,4]. Genetic redundancy is not the rule as many paralogous genes demonstrate highly divergent function. Mutant analysis by targeted gene disruption is a powerful technique for analyzing the function of genes implicated in specific processes (reverse genetics).

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Evolutionary Biology	Publication Date: Nov 18, 2010
Citations: 65	License type: cc-by

R Discovery Prime

R Discovery Prime

Predicting genome-wide redundancy using machine learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology

Lead the way for us

Similar Papers

Review of Machine and Deep Learning Techniques in Epileptic Seizure Detection using Physiological Signals and Sentiment Analysis
Deba Prasad Dash ... Mohammad R Khosravi
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Deba Prasad Dash, et. al.Deba Prasad Dash ... Mohammad R Khosravi
15 Jan 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Prediction of oil and gas pipeline failures through machine learning approaches: A systematic review
Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
Energy Reports | VOL. 10
Abdulnaser M Al-Sabaeei, et. al.Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
16 Aug 2023
Energy Reports | VOL. 10

LSO-081 Genomic prediction model using machine learning techniques that can distinguish autoimmune diseases (RA or SLE) from healthy controls
Young Bin Joo ... Hye-Soon Lee
Lupus Science & Medicine | VOL. 10
Young Bin Joo, et. al.Young Bin Joo ... Hye-Soon Lee
01 Jul 2023
Lupus Science & Medicine | VOL. 10

COVID‐19: A systematic review of prediction and classification techniques
Om Ramakisan Varma ... Mala Kalra
International Journal of Imaging Systems and Technology | VOL. 33
Om Ramakisan Varma, et. al.Om Ramakisan Varma ... Mala Kalra
11 May 2023
International Journal of Imaging Systems and Technology | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting genome-wide redundancy using machine learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology