Sensing the cilium, digital capture of ciliary data for comparative genomics investigations

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

BackgroundCilia are specialized, hair-like structures that project from the cell bodies of eukaryotic cells. With increased understanding of the distribution and functions of various types of cilia, interest in these organelles is accelerating. To effectively use this great expansion in knowledge, this information must be made digitally accessible and available for large-scale analytical and computational investigation. Capture and integration of knowledge about cilia into existing knowledge bases, thus providing the ability to improve comparative genomic data analysis, is the objective of this work.MethodsWe focused on the capture of information about cilia as studied in the laboratory mouse, a primary model of human biology. The workflow developed establishes a standard for capture of comparative functional data relevant to human biology. We established the 310 closest mouse orthologs of the 302 human genes defined in the SYSCILIA Gold Standard set of ciliary genes. For the mouse genes, we identified biomedical literature for curation and used Gene Ontology (GO) curation paradigms to provide functional annotations from these publications.ResultsEmploying a methodology for comprehensive capture of experimental data about cilia genes in structured, digital form, we established a workflow for curation of experimental literature detailing molecular function and roles of cilia proteins starting with the mouse orthologs of the human SYSCILIA gene set. We worked closely with the GO Consortium ontology development editors and the SYSCILIA Consortium to improve the representation of ciliary biology within the GO. During the time frame of the ontology improvement project, we have fully curated 134 of these 310 mouse genes, resulting in an increase in the number of ciliary and other experimental annotations.ConclusionsWe have improved the GO annotations available for mouse genes orthologous to the human genes in the SYSCILIA Consortium’s Gold Standard set. In addition, ciliary terminology in the GO itself was improved in collaboration with GO ontology developers and the SYSCILIA Consortium. These improvements to the GO terms for the functions and roles of ciliary proteins, along with the increase in annotations of the corresponding genes, enhance the representation of ciliary processes and localizations and improve access to these data during large-scale bioinformatic analyses.

Similar Papers
  • Research Article
  • Cite Count Icon 172
  • 10.1038/msb4100043
Global analysis of gene function in yeast by quantitative phenotypic profiling
  • Jan 1, 2006
  • Molecular Systems Biology
  • James A Brown + 8 more

We present a method for the global analysis of the function of genes in budding yeast based on hierarchical clustering of the quantitative sensitivity profiles of the 4756 strains with individual homozygous deletion of nonessential genes to a broad range of cytotoxic or cytostatic agents. This method is superior to other global methods of identifying the function of genes involved in the various DNA repair and damage checkpoint pathways as well as other interrogated functions. Analysis of the phenotypic profiles of the 51 diverse treatments places a total of 860 genes of unknown function in clusters with genes of known function. We demonstrate that this can not only identify the function of unknown genes but can also suggest the mechanism of action of the agents used. This method will be useful when used alone and in conjunction with other global approaches to identify gene function in yeast.

  • Research Article
  • Cite Count Icon 30
  • 10.1038/npre.2009.3154.1
The Gene Ontology Annotation (GOA) Database
  • Apr 23, 2009
  • Nature Precedings
  • Rachael Huntley + 4 more

The Gene Ontology (GO) is a well-established, structured vocabulary that has been successfully used for 10 years in the annotation of proteins. GO terms, created in consultation with the biology community, are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently GO consists of more than 26,500 terms distributed over three ontologies that describe the molecular function, biological process and subcellular location of a protein in a generic cell.The Gene Ontology Annotation (GOA) database ("http://www.ebi.ac.uk/GOA":http://www.ebi.ac.uk/GOA) aims to provide high-quality manual and electronic GO annotations to proteins within the UniProt Knowledgebase (UniProtKB). By annotating all ‘known’ proteins with GO terms and transferring this knowledge to highly similar ‘unknown’ proteins, GOA offers a valuable contribution to the understanding of all proteomes.As well as generating manual annotation, made by extracting experimental evidence from full text peer-reviewed publications, GOA produces electronic annotation by making large-scale assignments of GO terms to proteins using computational methods. To date we have six electronic annotation methods including; InterPro2GO, Swiss-Prot Keyword2GO and the projection of annotations between orthologous species using Ensembl Compara.GOA provides annotated entries for over 180,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. In addition, by integrating GO annotations from model organism groups (e.g. FlyBase, GeneDB, MGI, RGD, SGD and TAIR), GOA ensures the dataset remains a key reference. GOA prioritises the annotation of the human proteome and provides this annotation to the GO Consortium’s Reference Genome project.GOA produces monthly releases of annotations to the human, mouse, rat, zebrafish, cow, chicken and Arabidopsis proteomes as well as a file for the multiple species within UniProtKB. The GOA dataset can be queried through a user-friendly web interface via our QuickGO browser("http://www.ebi.ac.uk/QuickGO":http://www.ebi.ac.uk/QuickGO) or downloaded in a parsable format via the EBI("ftp://ftp.ebi.ac.uk/pub/databases/GO/goa":ftp://ftp.ebi.ac.uk/pub/databases/GO/goa) and GO FTP sites. The GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results.

  • Research Article
  • Cite Count Icon 46
  • 10.1093/database/bau074
BC4GO: a full-text corpus for the BioCreative IV GO task
  • Jul 28, 2014
  • Database: The Journal of Biological Databases and Curation
  • Kimberly Van Auken + 14 more

Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain ∼10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community.Database URL:http://www.biocreative.org/resources/corpora/bc-iv-go-task-corpus/.

  • Research Article
  • Cite Count Icon 99
  • 10.1093/nar/gkl936
AgBase: a unified resource for functional analysis in agriculture
  • Nov 29, 2006
  • Nucleic Acids Research
  • Fiona M Mccarthy + 6 more

Analysis of functional genomics (transcriptomics and proteomics) datasets is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation. To facilitate systems biology in these species we have established the curated, web-accessible, public resource ‘AgBase’ (). We have improved the structural annotation of agriculturally important genomes by experimentally confirming the in vivo expression of electronically predicted proteins and by proteogenomic mapping. Proteogenomic data are available from the AgBase proteogenomics link. We contribute Gene Ontology (GO) annotations and we provide a two tier system of GO annotations for users. The ‘GO Consortium’ gene association file contains the most rigorous GO annotations based solely on experimental data. The ‘Community’ gene association file contains GO annotations based on expert community knowledge (annotations based directly from author statements and submitted annotations from the community) and annotations for predicted proteins. We have developed two tools for proteomics analysis and these are freely available on request. A suite of tools for analyzing functional genomics datasets using the GO is available online at the AgBase site. We encourage and publicly acknowledge GO annotations from researchers and provide an online mechanism for agricultural researchers to submit requests for GO annotations.

  • PDF Download Icon
  • Supplementary Content
  • Cite Count Icon 35
  • 10.1186/1471-2180-9-s1-s8
Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae
  • Feb 1, 2009
  • BMC Microbiology
  • Shaowu Meng + 7 more

BackgroundMagnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 . However, a comprehensive manual curation remains to be performed. Gene Ontology (GO) annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly.MethodsA similarity-based (i.e., computational) GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked.ResultsIn total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO). In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57%) being annotated with 1,957 distinct and specific GO terms. Unannotated proteins were assigned to the 3 root terms. The Version 5 GO annotation is publically queryable via the GO site . Additionally, the genome of M. oryzae is constantly being refined and updated as new information is incorporated. For the latest GO annotation of Version 6 genome, please visit our website . The preliminary GO annotation of Version 6 genome is placed at a local MySql database that is publically queryable via a user-friendly interface Adhoc Query System.ConclusionOur analysis provides comprehensive and robust GO annotations of the M. oryzae genome assemblies that will be solid foundations for further functional interrogation of M. oryzae.

  • Book Chapter
  • Cite Count Icon 14
  • 10.15586/alzheimersdisease.2019.ch2
Gene Ontology: A Resource for Analysis and Interpretation of Alzheimer’s Disease Data
  • Nov 29, 2019
  • Barbara Kramarz + 1 more

Gene Ontology (GO) is a universal resource for analyses and interpretation of high-throughput biological datasets. GO is developed and curated by several different groups, based at scientific institutions around the world, working together under the auspices of the GO Consortium. GO annotations capture biological functional knowledge by associating gene products with GO terms. GO term and gene product records all have computer-readable accession numbers; therefore, these annotations can be easily used for analyses of large datasets while retaining human-readable labels. The UCL Functional Gene Annotation group focuses on GO annotation of human gene products. Our group has led initiatives to systematically annotate proteins and microRNAs across specific biomedical fields, and our current biocuration effort, funded by the Alzheimer's Research UK foundation, is focused on dementia and Alzheimer's disease. Our group has also contributed to the development and revision of the ontology describing neurological domains of biology. Here we present an overview of GO and explain how our work, as well as the work of other members of the GO Consortium, is improving the neurological domains of the GO resource. These biocuration efforts will benefit the dementia and Alzheimer's research community by rendering GO more suitable for analyses of neurological datasets.

  • Research Article
  • Cite Count Icon 3
  • 10.5511/plantbiotechnology.26.9
DAGViz: a directed acyclic graph browser that supports analysis of Gene Ontology annotation
  • Jan 1, 2009
  • Plant Biotechnology
  • Kentaro Yano + 3 more

The number of nucleotide and protein sequences deposited in public databases has rapidly increased in recent times. To provide consistent annotation of orthologous genes across organisms, annotation using Gene Ontology (GO) terms is shared among most of these databases. GO outlines a structured vocabulary used for describing biological properties of gene products. The illustration of GO terms using complete paths to the root term on the basis of a directed acyclic graph (DAG) approach aids in the systematic grasping of the functional information related to a gene product. However, the website provided by the GO Consortium does not present DAGs for two or more GO terms due to difficulties in the depiction of the complex relationships of such GO terms. To overcome these problems, we have constructed a DAG-based browser, termed DAGViz, that shows DAG-based information of multiple GO terms assigned to one or more genes within a single screen (http://www.pgb.kazusa.or.jp/dagviz/). In the current report, we illustrate the advantages of DAGViz in analyzing GO annotation.

  • Research Article
  • Cite Count Icon 14
  • 10.1186/s12918-016-0361-5
Interspecies gene function prediction using semantic similarity
  • Dec 1, 2016
  • BMC Systems Biology
  • Guoxian Yu + 3 more

BackgroundGene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them.ResultsSemantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other.ConclusionsOur study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.Electronic supplementary materialThe online version of this article (doi:10.1186/s12918-016-0361-5) contains supplementary material, which is available to authorized users.

  • Research Article
  • Cite Count Icon 12
  • 10.1186/gb-2008-9-s1-s1
A race through the maze of genomic evidence
  • Jan 1, 2008
  • Genome Biology
  • Timothy R Hughes + 1 more

A race through the maze of genomic evidence

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.1007/978-1-4939-3743-1_17
Annotation Extensions.
  • Nov 4, 2016
  • Methods in molecular biology (Clifton, N.J.)
  • Rachael P Huntley + 1 more

The specificity of knowledge that Gene Ontology (GO) annotations currently can represent is still restricted by the legacy format of the GO annotation file, a format intentionally designed for simplicity to keep the barriers to entry low and thus encourage initial adoption. Historically, the information that could be captured in a GO annotation was simply the role or location of a gene product, although genetically interacting or binding partners could be specified. While there was no mechanism within the original GO annotation format for capturing additional information about the context of a GO term, such as the target gene of an activity or the location of a molecular function, the long-term vision for the GO Consortium was to provide greater expressivity in its annotations to capture physiologically relevant information.Thus, as a step forwards, the GO Consortium has introduced a new field into the annotation format, annotation extensions, which can be used to capture valuable contextual detail. This provides experimentally verified links between gene products and other physiological information that is crucial for accurate analysis of pathway and network data. This chapter will provide a simple overview of annotation extensions, illustrated with examples of their usage, and explain why they are useful for scientists and bioinformaticians alike.

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.tim.2009.04.003
Applying the Gene Ontology in microbial annotation
  • Jul 1, 2009
  • Trends in Microbiology
  • Michelle G Giglio + 3 more

Applying the Gene Ontology in microbial annotation

  • Research Article
  • Cite Count Icon 8
  • 10.1093/database/bas038
Developing a biocuration workflow for AgBase, a non-model organism database
  • Nov 15, 2012
  • Database: The Journal of Biological Databases and Curation
  • Lakshmi Pillai + 5 more

AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify literature for curation. The first component of our annotation interface is the gene prioritization interface that ranks gene products for annotation. Biocurators select the top-ranked gene and mark annotation for these genes as ‘in progress’ or ‘completed’; links enable biocurators to move directly to our biocuration interface (BI). Our BI includes all current GO annotation for gene products and is the main interface to add/modify AgBase curation data. The BI also displays Extracting Genic Information from Text (eGIFT) results for each gene product. eGIFT is a web-based, text-mining tool that associates ranked, informative terms (iTerms) and the articles and sentences containing them, with genes. Moreover, iTerms are linked to GO terms, where they match either a GO term name or a synonym. This enables AgBase biocurators to rapidly identify literature for further curation based on possible GO terms. Because most agricultural species do not have standardized literature, eGIFT searches all gene names and synonyms to associate articles with genes. As many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene, and filtering is applied to remove abstracts that mention a gene in passing. The BI is linked to our Journal Database (JDB) where corresponding journal citations are stored. Just as importantly, biocurators also add to the JDB citations that have no GO annotation. The AgBase BI also supports bulk annotation upload to facilitate our Inferred from electronic annotation of agricultural gene products. All annotations must pass standard GO Consortium quality checking before release in AgBase.Database URL:http://www.agbase.msstate.edu/

  • Research Article
  • Cite Count Icon 179
  • 10.1038/msb.2013.40
A map of cell type-specific auxin responses.
  • Jan 1, 2013
  • Molecular Systems Biology
  • Bastiaan O R Bargmann + 10 more

In plants, changes in local auxin concentrations can trigger a range of developmental processes as distinct tissues respond differently to the same auxin stimulus. However, little is known about how auxin is interpreted by individual cell types. We performed a transcriptomic analysis of responses to auxin within four distinct tissues of the Arabidopsis thaliana root and demonstrate that different cell types show competence for discrete responses. The majority of auxin-responsive genes displayed a spatial bias in their induction or repression. The novel data set was used to examine how auxin influences tissue-specific transcriptional regulation of cell-identity markers. Additionally, the data were used in combination with spatial expression maps of the root to plot a transcriptomic auxin-response gradient across the apical and basal meristem. The readout revealed a strong correlation for thousands of genes between the relative response to auxin and expression along the longitudinal axis of the root. This data set and comparative analysis provide a transcriptome-level spatial breakdown of the response to auxin within an organ where this hormone mediates many aspects of development.

  • Research Article
  • Cite Count Icon 170
  • 10.1093/bioinformatics/btm195
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.
  • Jul 1, 2007
  • Bioinformatics (Oxford, England)
  • Ying Tao + 4 more

Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes). We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11,000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43-58%) can be achieved for the human GO Annotation file dated 2003. The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/. Supplementary data are available at Bioinformatics online.

  • Research Article
  • Cite Count Icon 25
  • 10.1186/1471-2105-9-440
PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data
  • Oct 16, 2008
  • BMC Bioinformatics
  • Laurent Bréhélin + 2 more

BackgroundOf the 5 484 predicted proteins of Plasmodium falciparum, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes.ResultsWe present PlasmoDraft , a database of Gene Ontology (GO) annotation predictions for P. falciparum genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a Guilt By Association method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all P. falciparum genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2 434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (e.g. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1 905 and 1 540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%).ConclusionAll predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.

Save Icon
Up Arrow
Open/Close