Abstract

A major challenge in bio-medicine is finding the genetic causes of human diseases, and researchers are often faced with a large number of candidate genes. Gene prioritization methods provide a valuable support in guiding researchers to detect reliable candidate causative-genes for a disease under study. Indeed, such methods rank genes according to their association with a disease of interest. Actually, the majority of genetic disorders has few or none causative genes associated with them; this induces a high labeling unbalance in the corresponding ranking problems, thus linking the need of achieving reliable solutions to the adoption of imbalance-aware techniques. We propose the use of an expressly designed imbalance-aware methodology for prioritizing genes, which first rebalances the training set entries through a negative selection procedure, then applies a learning algorithm 'sensitive' to the misclassification of positive instances, to provide the gene ranking. The algorithm has a reduced time complexity, which makes feasible its application on large-sized datasets. The validation of this methodology proved its competitiveness with state-of-art techniques on a benchmark composed of 708 selected Medical Subject Headings diseases, and provided some putative novel gene-disease associations.

Highlights

  • The discovery of so-called disease genes, whose disruption causes congenital or acquired diseases, is important both towards diagnosis and new therapies, through the elucidation of diseases etiology

  • We first describe the results of the comparison between the state-of-the-art methodologies and methods described in Section 3.2, we show the results of their extensions based on negative selection procedures presented in Sections 3.3 and 3.4

  • The selection procedure is based on the observation that a non-positive point ranked below a suitably fixed threshold τ > 0 cannot be reasonably attributed to any cluster of positive points, whereas nodes corresponding to points D−,τ = {∆ ∈ D−|σ(∆) ≥ τ } are the ones to be filtered out before further processing (Figure 1(c)). We highlight that such filtering process, despite adding a new parameter, can significantly lower the ranking complexity obtained at the end of Section 3.3 by diminishing the value of |D−|. This variant of the proposed model is depicted in Figure 1: instances are projected onto a bidimensional space (Figure 1(a)); negative instances are ranked according to a clusterization of positive examples (Figure 1(b)); a selection procedure is operated on negative examples (Figure 1(c)); and a generalized linear model (GLM) is learned to separate the retained projected

Read more

Summary

Introduction

The discovery of so-called disease genes, whose disruption causes congenital or acquired diseases, is important both towards diagnosis and new therapies, through the elucidation of diseases etiology. Finding the causal gene(s) among these candidates is an expensive and time-consuming process, which requires extensive laboratory experiments [3] This list of genes is very large, it can be just partially handled by manual curators. Several strategies have emerged to rank the variants and the genes that they affect, with those most likely to cause disease ranked highest, through a process termed gene prioritization (GP) [4]. This task is extremely challenging since complex genetic disorders often involve several primarily responsible genes; in most cases there has been only limited success in identifying causative genes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.