Abstract

Motivation: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help identify the influences of genes and their modification on phenotypes. Gene–phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are ∼20 000 genes in higher vertebrate genomes and the experimental verification of gene–phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing.Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene–phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1967 secondary phenotype hypotheses that cover 244 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed.Availability: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/gene/secondaryphenotype/list.Contact: ao5@sanger.ac.uk or ds5@sanger.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • A causative gene has not yet been identified for almost half of the existing human heritable diseases (Schofield et al, 2012)

  • Experimental results of mutagenesis experiments are stored in species-specific Model Organism Database (MOD)s (Leonelli and Ankeny, 2012), e.g. the Sanger Mouse Genetics Project (Sanger-MGP) (White et al, 2013), WormBase (Yook et al, 2012), the Mouse Genome Database (MGD) (Bult et al, 2012) or FlyBase (Drysdale and FlyBase Consortium, 2008)

  • In the Bioinformatics domain, association rule mining has been previously successfully applied to large annotation sets with the aim to find relationships between gene functions described with Gene Ontology (GO) (Botstein et al, 2000; Kumar et al, 2004; Manda et al, 2012)

Read more

Summary

Introduction

A causative gene has not yet been identified for almost half of the existing human heritable diseases (Schofield et al, 2012). In order to be able to find cures and prevention mechanisms for human genetic disorders, we need to comprehensively understand how each disease originates and progresses over time. A collection of human diseases together with confirmed and speculative causes is available from resources such as the Online Mendelian Inheritance in Man (OMIM) (Amberger et al, 2011) or Orphanet (Ayme, 2003) database. In the quest for identifying causative genes for human genetic disorders, model organisms have gained increasing importance due to the opportunities arising from targeted gene modifications. The mouse shares 99% of genes with humans, and gene modifications leading to phenotypes characteristic for a disease may offer clues to the origins of this disease (Rosenthal and Brown, 2007). Experimental results of mutagenesis experiments are stored in species-specific Model Organism Database (MOD)s (Leonelli and Ankeny, 2012), e.g. the Sanger Mouse Genetics Project (Sanger-MGP) (White et al, 2013), WormBase (Yook et al, 2012), the Mouse Genome Database (MGD) (Bult et al, 2012) or FlyBase (Drysdale and FlyBase Consortium, 2008)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call