Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets.

Zhe Ren,Ruo Zhou,Siqi Liu,Kai Li,Bo Wen,Nina Pugh,Da Qi,Andrew R Jones,Shaohang Xu

doi:10.1074/mcp.ra118.000832

Abstract

Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome.

Highlights

101 new loci were matched by novel peptides, not currently annotated as genes. Data are made persistently available for simple visualization on genome browsers
The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species
By comparing with the official junction sites, 1,047,488 junctions (56,432 nonredundant junction sites) in the RNA Sequencing (RNA-Seq) data exactly matched to those annotated junctions, whereas the remaining 1,893,300 junctions (298,891 junction sites) were marked as novel junctions (NJs)

Summary

Introduction

101 new loci were matched by novel peptides, not currently annotated as genes. Data are made persistently available for simple visualization on genome browsers. We have performed a comprehensive proteogenomics analysis on rice through collecting public genomics, transcriptomics and proteomics data, to discover novel protein-coding genes and new splice sites.

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular & cellular proteomics : MCP	Publication Date: Oct 5, 2018
Citations: 20	License type: cc-by

R Discovery Prime

R Discovery Prime

Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular & cellular proteomics : MCP

Lead the way for us

Similar Papers

The Institute for Genomic Research Osa1 Rice Genome Annotation Database
Qiaoping Yuan ... Aihui Wang
Plant Physiology | VOL. 138
Qiaoping Yuan, et. al.Qiaoping Yuan ... Aihui Wang
01 May 2005
Plant Physiology | VOL. 138

Rice RS2-9, which is bound by transcription factor OSH1, blocks enhancer-promoter interactions in plants.
Huawei Liu ... Wenying Xu
The Plant Journal | VOL. 109
Huawei Liu, et. al.Huawei Liu ... Wenying Xu
22 Dec 2021
The Plant Journal | VOL. 109

Expression and Function of Proteins during Development of the Basal Region in Rice Seedlings
Naoki Tanaka ... Setsuko Komatsu
Molecular & cellular proteomics : MCP | VOL. 4
Naoki Tanaka, et. al.Naoki Tanaka ... Setsuko Komatsu
01 Jun 2005
Molecular & cellular proteomics : MCP | VOL. 4

RiceENCODE: A comprehensive epigenomic database as a rice Encyclopedia of DNA Elements
Liang Xie ... Guoliang Li
Molecular plant | VOL. 14
Liang Xie, et. al.Liang Xie ... Guoliang Li
27 Aug 2021
Molecular plant | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular & cellular proteomics : MCP