Abstract

AbstractSpecies identification using matrix assisted laser desorption/ionization time‐of‐flight mass spectrometry (MALDI‐TOF MS) data strongly relies on reference libraries to differentiate species. Because comprehensive reference libraries, especially for metazoans, are rare, we explored the accuracy of unsupervised diversity estimations of communities using MALDI‐TOF MS data in the absence of reference libraries to provide a method for future application in ecological research. To discover the best analysis strategy providing high congruence with true community structures, we carried out a simulation with more than 30,000 analyses using different combinations of data transformations, dimensionality reductions, and cluster algorithms. Species profile, Hellinger, and presence/absence transformations were applied to raw data and dimensions were reduced using principal component analysis (PCA), t‐distributed stochastic neighbor embedding, and uniform manifold approximation and projection. To estimate biodiversity, data were clustered making use of partitioning around medoids, model‐based clustering, and K‐means clustering. The analyses were carried out on published mass spectrometry data of harpacticoid copepods. Most successful combinations (Hellinger transformation + PCA or raw data + partitioning around medoids) returned good values even for difficult species distributions containing numerous singleton species. Nevertheless, errors occurred most frequently because of such singleton taxa. Hence, replicative sampling in wide sampling areas for analysis is emphasized to increase the minimum number of specimens per species, thus reducing putative sources of errors. Our results demonstrate that MALDI‐TOF MS data can be used to accurately estimate the biodiversity of unknown communities using unsupervised learning methods. The provided approach allows the biodiversity comparison of sampled regions for which no reference libraries are available. Hence, especially data on groups which demand a time‐consuming identification or are highly abundant can be analyzed within short working time, accelerating ecological studies.

Highlights

  • F Author Contribution Statement: S.R. carried out the analyses

  • By having tested various combinations of data transformations, dimensionality reduction methods, and different clustering algorithms, we provide a workflow to unsupervised biodiversity estimation based on MALDI-TOF MS data without the need for reference libraries

  • The nine best estimates with a percentage difference from the correct diversity of less than 10% used partitioning around medoids (PAM) clustering. These were followed by seven further estimates including PAM as clustering algorithm

Read more

Summary

Introduction

F Author Contribution Statement: S.R. carried out the analyses. P.M.A. designed the study and contributed to the writing of the manuscript and gave final approval for publication. To identify specimens based on a proteomic fingerprint, often company supplied supervised identification software solutions such as the MALDI Biotyper by Bruker are used These find the most similar spectra from a reference library and return a value of certainty for the resulting identification. Some studies employed techniques such as hierarchical clustering (Kaiser et al 2018) or principal component analysis (PCA; Hynek et al 2018) to discriminate species All these techniques rely on reference libraries to assess species diversity and some fail to detect false-positive classifications. Assessing biodiversity using proteomic fingerprinting in areas for which no MALDI-TOF reference libraries are available is difficult Supervised tools such as the Bruker MALDI-TOF Biotyper or the random forest approach cannot provide identifications without a library. The workflow can be applied by using the R function provided in the Supporting Information S1

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call