Abstract

Objective: Candidate disease genes identification has grasped the attention of many researchers for its significant role in bioinformatics. In this review, we demonstrate several classifications of some recent identification approaches and their datasets. Methods/Findings: The approaches are classified into five categories and the datasets into two categories. Some categories are also classified into several types. In each category, we explain every approach based on its objectives, mechanism, datasets and results. Different algorithms have been used such as random walk algorithm, machine-learning algorithms or genetic algorithm. Furthermore, the common approach followed to test the performance is cross-validation approach using precision, recall and F1-metrics. During our research, we found a novelty of many methods and a noticeable improvement in some networks and algorithms. We also noticed that the major emphasis was to enhance genome datasets using different mechanisms such as integrating them or adding new features. We noticed that most researchers focus more on this aspect as they believe that the best way to improve genes prioritization and identification and get more accurate results is to have a reliable dataset including all required information. Application: This survey can be a valuable source of information. It explains and summaries every item in the classification in a simple and understandable way. Therefore, it can be used by researchers concerning with disease genes identification as it can enlighten and guide them to different techniques and dataset used in this subject. Keywords: Biological and Topological Properties, Disease Gene Identification, Gene Expression, Gene Ontology, Phenotype, Protein Interaction Networks

Highlights

  • Nowadays, disease candidate genes prediction is a critical part of biomedical research[1]

  • The limited number of known disease genes compared to the huge set of unlabeled or unknown genes makes it difficult for many methods such as machine learning techniques to learn from this limited set[4] and to produce unbiased set of candidate genes

  • The results showed that the last proposed algorithm had achieved higher performance compared to RWRDP, RWRHN and RWRDPPLLR algorithms

Read more

Summary

Introduction

Disease candidate genes prediction is a critical part of biomedical research[1]. State-of-the-art in Candidate Disease Genes Prioritization and Prediction Approaches and Techniques molecular level to understand the mechanisms leading to the disease remains a challenge in biomedical science[3]. Many methodologies have tried to understand the mechanism of these known disease genes and the complex interplay between them and their proteins[3] They integrated variety molecular networks such as gene expression, genetic linkage, protein-protein interactions and gene-phenotype associations. We illustrate a number of different approaches for complex, non-complex, specific and general disease associated genes identification. We divide these approaches into five categories based on the techniques used to enhance and exploit datasets and develop more efficient ranking algorithms. We can say that all of them are mutually correlated, and we believe that more methods in future would be developed taking these approaches as their ground base

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.