Mass spectrometric data analysis of complex biological mixtures can be a challenge due to its vast datasets. There is lack of data treatment pipelines to analyze chemical signals versus noise. These tasks, so far, have been up to the discretion of the analysts. The aim of this work is to demonstrate an analytical workflow that would enhance the confidence in metabolomics before answering biological questions by serial dilution of botanical complex mixture and high-dimensional data analysis. Furthermore, we would like to provide an alternative approach to a univariate p-value cutoff from t-test for blank subtraction procedure between negative control and biological samples. A serial dilution of complex mixture analysis under electrospray ionization was proposed to study firsthand chemical complexity of metabolomics. Advanced statistical models using high-dimensional penalized regression were employed to study both the concentration and ion intensity relationship and the ion-ion relationship per second of retention time sub dataset. The multivariate analysis was carried out with a tool built in-house, so called metabolite ions extraction and visualization, which was implemented in R environment. A test case of the medicinal plant goldenseal (Hydrastis canandensis L.), showed an increase in metabolome coverage of features deemed as "important" by a multivariate analysis compared to features deemed as "significant" by a univariate t-test. For an illustration, the data analysis workflow suggested an unexpected putative compound, 20-hydroxyecdysone. This suggestion was confirmed with MS/MS acquisition and literature search. The multivariate analytical workflow selects "true" metabolite ions signals and provides an alternative approach to a univariate p-value cutoff from t-test, thus enhancing the data analysis process of metabolomics.
Read full abstract