Abstract

In eukaryotes, mRNA abundance is often a poor proxy for protein abundance<sup>1–5</sup>. Despite this, the majority of methods used to dissect function in mammalian biology<sup>6</sup> and for biomarker discovery in complex diseases<sup>7</sup> involve manipulation or measurement of mRNA. The discrepancy between mRNA and protein abundance is likely due to several factors, including differences in the rates of translation and degradation between proteins and cell-types<sup>8</sup>, unequal contribution of individual splice variants to the production of a given protein<sup>9</sup> and cell-type specific differences in splice variant use<sup>10</sup>. Here we performed experimental and computational time-series analysis of RNA-seq and mass-spectrometry of three key immune cell-types in human and mice and constructed mathematical mixed time-delayed splice variant models to predict protein abundances. These models had median correlations to protein abundance measurements of 0.79-0.94, which is a significant increase from the previously reported 0.21 on human protein atlas data<sup>1</sup>, and out-performed less complicated models without the usage of multiple splice variants and time-delay in cross-validation tests. We showed the importance of our models for biomarker discovery by re-analysing RNA-seq data from five different complex diseases, which led to the prediction of new disease proteins that were validated in multiple sclerosis. Our findings suggest that similar protein abundance models may be created for the most critical cell-types in the human body.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call