Abstract

BackgroundBiomolecular methods for species identification are increasingly being utilised in the study of changing environments, both at the microscopic and macroscopic levels. High-throughput peptide mass fingerprinting has been largely applied to bacterial identification, but increasingly used to identify archaeological and palaeontological skeletal material to yield information on past environments and human-animal interaction. However, as applications move away from predominantly domesticate and the more abundant wild fauna to a much wider range of less common taxa that do not yet have genetically-derived sequence information, robust methods of species identification and biomarker selection need to be determined.ResultsHere we developed a supervised machine learning algorithm for classifying the species of ancient remains based on collagen fingerprinting. The aim was to minimise requirements on prior knowledge of known species while yielding satisfactory sensitivity and specificity. The algorithm uses iterations of a modified random forest classifier with a similarity scoring system to expand its identified samples. We tested it on a set of 6805 spectra and found that a high level of accuracy can be achieved with a training set of five identified specimens per taxon.ConclusionsThis method consistently achieves higher accuracy than two-dimensional principal component analysis and similar accuracy with hierarchical clustering using optimised parameters, which greatly reduces requirements for human input. Within the vertebrata, we demonstrate that this method was able to achieve the taxonomic resolution of family or sub-family level whereas the genus- or species-level identification may require manual interpretation or further experiments. In addition, it also identifies additional species biomarkers than those previously published.

Highlights

  • Biomolecular methods for species identification are increasingly being utilised in the study of changing environments, both at the microscopic and macroscopic levels

  • Biomarker-based methods were able to reach the species level with high accuracy in both bacteria and yeasts [14]. Their performance in ancient species identification was less satisfactory due to difficulties in finding well-defined biomarkers not affected by great variations due to differences in levels of decay over time, greatly reducing relative concentration; ancient collagen, the main target of Peptide mass fingerprint (PMF) derived from archaeological and palaeontological specimens can contain many post-translational modifications (PTMs), some of which are affected by decay

  • Acquisition of Matrix Assisted Laser Desorption Ionization (MALDI)-Time of Flight (ToF) mass spectrometry data Mass spectrometry data were acquired from a previous publication [26], where microfaunal specimens were recovered from a single archaeological site called Pin Hole Cave (UK), with additional specimens from the spoil heap and elsewhere in the cave

Read more

Summary

Introduction

Biomolecular methods for species identification are increasingly being utilised in the study of changing environments, both at the microscopic and macroscopic levels. Biomarker-based methods were able to reach the species level with high accuracy in both bacteria and yeasts [14]. Their performance in ancient species identification was less satisfactory due to difficulties in finding well-defined biomarkers not affected by great variations due to differences in levels of decay over time, greatly reducing relative concentration; ancient collagen, the main target of PMFs derived from archaeological and palaeontological specimens can contain many post-translational modifications (PTMs), some of which are affected by decay. Multivariate regressions such as principal component analysis and partial least square regression have been used in addition to biomarkers to separate different taxa [15, 16]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call