Abstract
The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
Highlights
The RIKEN Mouse Gene Encyclopedia project was launched with the aim of cloning and sequencing full-length mouse cDNAs
In FANTOM3, 42,031 transcripts were newly annotated and the functional annotation of 4,347 FANTOM2 transcripts was updated with the improved annotation system
Combining the results of FANTOM2 and FANTOM3, 102,801 cDNAs were functionally annotated by the international effort
Summary
The RIKEN Mouse Gene Encyclopedia project was launched with the aim of cloning and sequencing full-length mouse cDNAs. In FANTOM2, this interface was extended to be an all online annotation system from remote sites via the Internet, through the Mouse Annotation Teleconference for RIKEN cDNA Sequences (MATRICS). CDNA collection from a number of novel cellular and tissue sources was continued. In this process, many novel cDNAs derived from distinct genomic loci were fully sequenced. In FANTOM3, these newly sequenced cDNAs were mapped to the mouse genome and subjected to functional annotation. Given the substantial increase in cDNA sequence information in mouse and other mammalian species since FANTOM2, the new annotation process provided the opportunity to update and improve the previous functional annotation of RIKEN cDNAs from FANTOM1 and FANTOM2
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.