General proteomics research for fundamental science typically addresses laboratory- or patient-derived samples of known origin and composition. However, in a few research areas, such as environmental proteomics, clinical identification of infectious organisms, archeology, art/cultural history, and forensics, attributing the origin of a protein-containing sample to the organisms that produced it is a central focus. A small number of groups have approached this problem and developed software tools for taxonomic characterization and/or identification using bottom-up proteomics. Most such tools identify peptides via database search, and many rely on organism-specific peptides as markers. Our group recently introduced MARLOWE, a software tool for taxonomic characterization of unknown samples based on de novo peptide identification and signal-erosion-resistant strong peptides, which are shared peptides distributed in a taxonomy-dependent manner. In the current work, we further characterize the utility of MARLOWE using publicly available proteomics data from forensically-relevant samples. MARLOWE characterizes samples based on their protein profile, and returns ranked organism lists of potential contributors and taxonomic scores based on shared strong peptides between organisms. Overall, the correct characterization rate ranges between 44 and 100%, depending on the sample type and data acquisition parameters (with lower numbers associated with lower-quality data sets). MARLOWE demonstrates successful characterization of true contributors and close relatives, and provides sufficient specificity to distinguish certain microbial species. MARLOWE demonstrates its ability to provide insight into potential taxonomic sources for a wide range of sample types without prior assumptions about sample contents. This approach can find utility in forensic science and also broadly in bioanalytical applications that utilize proteomics approaches for taxonomic characterization.
Read full abstract