BackgroundMaking hypothesis-led interrogations of whole-genome transcriptomic datasets is a challenge. Transcriptional modules define specific biological processes, but multiple modules have now been published with overlapping annotations. In this study we aimed to generate and validate the modular discrimination index (MDI) score, ranking the accuracy and resolution of any cell type specific module to assess cellular enrichment in tissues from transcriptional datasets. MethodsWe made use of publicly available transcriptomic datasets of purified immune cell types to compare the expression of modules annotated to reflect the expression of different immune cells. MDI scores were derived from the relative expression of each module in its annotated cell type, relative to all other cells. Independent datasets from human tissue and blood samples were then used to validate the MDI score in relation to the relative enrichment of different modules' gene expression. FindingsWe found that 26 T-cell modules varied both in their composition and in their gene expression in a tuberculin skin test, a prototypical T cell driven cell inflammatory process. To address this heterogeneity, we assessed the accuracy of cell-type modules, by calculating MDI scores for all published T cell, B cell, natural killer cell and neutrophil modules. MDI scores strongly predicted each module's ability to show histologically proven cellular enrichment in a variety of independent tissue samples: T cells in psoriatic skin, B cells in lung fibrosis, and neutrophils in erythema nodosum leprosum skin lesions. In addition, MDI scores were also validated in relation to changes in cell frequency in blood: modules with the greatest MDI score were most predictive of each module's association with cell numbers for both high frequency cells (neutrophils r2=0·545, p=0·01) and low frequency cells (B cells r2=0·865, p=0·01) in blood. Finally, we showed that modules with the highest MDI score elicited the T cell enrichment seen in the tuberculin skin test, whereas this was most often not detected by low MDI scoring modules. InterpretationThis study shows that MDI scoring provides a robust and reproducible assessment of a transcriptional module's accuracy and resolution, and should be determined before using modular deconvolution of tissue samples. The MDI scoring framework can be extended and applied across a range of disciplines for which annotated modules could be used to interrogate transcriptional datasets. FundingWellcome Trust.
Read full abstract