Unipept, a pioneering software tool in metaproteomics, has significantly advanced the analysis of complex ecosystems by facilitating both taxonomic and functional insights from environmental samples. From the onset, Unipept's capabilities focused on tryptic peptides, utilizing the predictability and consistency of trypsin digestion to efficiently construct a protein reference database. However, the evolving landscape of proteomics and emerging fields like immunopeptidomics necessitate a more versatile approach that extends beyond the analysis of tryptic peptides. In this article, we present a significant update to the underlying index structure of Unipept, which is now powered by a Sparse Suffix Array index. This advancement enables the analysis of semitryptic peptides, peptides with missed cleavages, and nontryptic peptides such as those encountered in other research fields such as immunopeptidomics (e.g., MHC- and HLA-peptides). This new index benefits all tools in the Unipept ecosystem such as the web application, desktop tool, application programming interface (API), and command line interface. A benchmark study highlights significantly improved performance in handling missed cleavages, preserving the same level of accuracy.
Read full abstract