Abstract
ABSTRACTThe genes that are required for organismal survival are annotated as ‘essential genes’. Identifying all the essential genes of an animal species can reveal critical functions that are needed during the development of the organism. To inform studies on mouse development, we developed a supervised machine learning classifier based on phenotype data from mouse knockout experiments. We used this classifier to predict the essentiality of mouse genes lacking experimental data. Validation of our predictions against a blind test set of recent mouse knockout experimental data indicated a high level of accuracy (>80%). We also validated our predictions for other mouse mutagenesis methodologies, demonstrating that the predictions are accurate for lethal phenotypes isolated in random chemical mutagenesis screens and embryonic stem cell screens. The biological functions that are enriched in essential and non-essential genes have been identified, showing that essential genes tend to encode intracellular proteins that interact with nucleic acids. The genome distribution of predicted essential and non-essential genes was analysed, demonstrating that the density of essential genes varies throughout the genome. A comparison with human essential and non-essential genes was performed, revealing conservation between human and mouse gene essentiality status. Our genome-wide predictions of mouse essential genes will be of value for the planning of mouse knockout experiments and phenotyping assays, for understanding the functional processes required during mouse development, and for the prioritisation of disease candidate genes identified in human genome and exome sequence datasets.
Highlights
Essential genes are those that are required for the survival of an organism
Our classifier’s performance is more accurate than a support vector machine human essential gene classifier examined in jackknife tests and by 10-fold cross-validation (Yang et al, 2014)
A strength of our study is the use of 2 blind test sets to further interrogate the validity of our classifier, which differs from other prior research generating
Summary
Essential genes are those that are required for the survival of an organism. studies in unicellular organisms, such as yeast, have experimentally defined the set of essential genes in those species (Kofoed et al, 2015), the large genome size and developmental complexity of animal models have hindered a comprehensive experimental essentiality analysis in these organisms. Knowledge of essential genes in animal species is informative for understanding the biological functions required during development, as well as for identifying candidate genes for human genetic diseases. Mouse knockout experiments have proved useful in identifying a subset of mammalian essential genes (Sung et al, 2012); the entirety of the mouse genome has not yet been experimentally examined. In order to optimise knockout experiment design, machine learning algorithms (Yuan et al, 2012) have been used to predict the essentialities of mouse genes based on their genomic features. Predicting the essentialities of mouse genes using machine learning algorithms can aid in the identification of candidate genes for human genetic diseases, due to the close genetic and physiological similarities between mouse and human (Rosenthal and Brown, 2007). Machine learning methods are useful in identifying features associated with gene essentiality (Kabir et al, 2017)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have