Abstract

The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.

Highlights

  • Background & SummaryA complete genome sequence provides an essential infrastructure to study an organism at the molecular level

  • The Genome Reference Consortium has been providing a variety of reference genome assemblies, updated in a timely manner in order to reflect the outcomes of recent research[2]

  • Recent efforts to improve the genome annotations based on the current assemblies were carried, including the update of RefSeq[3] and GENCODE transcripts[4]

Read more

Summary

Background & Summary

A complete genome sequence provides an essential infrastructure to study an organism at the molecular level. We reprocessed the FANTOM5 data to make it available on the current assemblies GRCh38/hg[38] and GRCm38/mm[10]. We added new CAGE peaks introduced in the latest genome assemblies, for example, ones for newly introduced genes. For this purpose, we used the result of peak-calling by the same method as reported in the original report[8] based on the realigned CAGE reads with the latest genome assemblies (Data Citation 1), and chose non-overlapped CAGE peaks in the result to merged with the converted CAGE peaks. The reprocessed data of the FANTOM5 human and mouse CAGE datasets (Data Citations 2–10) are publicly available from the FANTOM5 data web site (http://fantom.gsc.riken. jp/5/datafiles/reprocessed/), LSDB Archive (Data Citation 11) and Figshare (hg[38] (Data Citation 12) and mm[10] (Data Citation 13))

Methods
Findings
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call