Abstract
Nematodes such as Caenorhabditis elegans are powerful systems to study basically all aspects of biology. Their species richness together with tremendous genetic knowledge from C. elegans facilitate the evolutionary study of biological functions using reverse genetics. However, the ability to identify orthologs of candidate genes in other species can be hampered by erroneous gene annotations. To improve gene annotation in the nematode model organism Pristionchus pacificus, we performed a genome-wide screen for C. elegans genes with potentially incorrectly annotated P. pacificus orthologs. We initiated a community-based project to manually inspect more than two thousand candidate loci and to propose new gene models based on recently generated Iso-seq and RNA-seq data. In most cases, misannotation of C. elegans orthologs was due to artificially fused gene predictions and completely missing gene models. The community-based curation raised the gene count from 25,517 to 28,036 and increased the single copy ortholog completeness level from 86% to 97%. This pilot study demonstrates how even small-scale crowdsourcing can drastically improve gene annotations. In future, similar approaches can be used for other species, gene sets, and even larger communities thus making manual annotation of large parts of the genome feasible.
Highlights
Nematodes such as Caenorhabditis elegans are powerful systems to study basically all aspects of biology
While automated annotation pipelines perform reasonably well to be useful for genetic screens[32,33,34] and evolutionary genomic analyses[35,36,37], their outcomes by far do not meet the standards of the gene annotations from classical model organisms such as C. elegans, Drosophila melanogaster, and Mus musculus that have been curated over decades by a large research community[38]
In order to make the P. pacificus system more tractable for researchers without extensive genomic and phylogenetic expertise, we need to minimize the discrepancy in gene annotation quality between C. elegans and P. pacificus
Summary
Nematodes such as Caenorhabditis elegans are powerful systems to study basically all aspects of biology. The community-based curation raised the gene count from 25,517 to 28,036 and increased the single copy ortholog completeness level from 86% to 97% This pilot study demonstrates how even small-scale crowdsourcing can drastically improve gene annotations. In order to make the P. pacificus system more tractable for researchers without extensive genomic and phylogenetic expertise, we need to minimize the discrepancy in gene annotation quality between C. elegans and P. pacificus To this end, we employed an integrative approach using comparative genomic and transcriptomic data combined with crowdsourcing to improve the P. pacificus annotations of C. elegans homologs and orthologs. A community-based manual curation of suspicious gene models reveals thousands of hidden orthologs and missing homologs This pilot study can be extended to even larger gene sets and communities possibly employing citizen scientists, which would raise the quality of gene annotations to the level[38]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.