3'Untranslated regions (3'UTRs) are essential portions of genes containing elements necessary for pre-mRNA 3'end processing and are involved in post-transcriptional gene regulation. Despite their importance, they remain poorly characterized in eukaryotes. Here, we have used a multi-pronged approach to extract and curate 3'UTR data from 11533 publicly available datasets, corresponding to the entire collection of Caenorhabditis elegans transcriptomes stored in the NCBI repository from 2009 to 2023. We have also performed high throughput cloning pipelines to identify and validate rare 3'UTR isoforms and incorporated and manually curated 3'UTR isoforms from previously published datasets. This updated C. elegans 3'UTRome (v3) is the most comprehensive resource in any metazoan to date, covering 97.4% of the 20362 experimentally validated protein-coding genes with refined and updated 3'UTR boundaries for 234893'UTR isoforms. We also used this novel dataset to identify and characterize sequence elements involved in pre-mRNA 3'end processing and update miRNA target predictions. This resource provides important insights into the 3'UTR formation, function, and regulation in eukaryotes.
Read full abstract