IgMAT: immunoglobulin sequence multi-species annotation tool for any species including those with incomplete antibody annotation or unusual characteristics

Daniel Dorey-Robinson,Giuseppe Maccari,John A Hammond

doi:10.1186/s12859-023-05624-2

Abstract

BackgroundThe advent and continual improvement of high-throughput sequencing technologies has made immunoglobulin repertoire sequencing accessible and informative regardless of study species. However, to fully map dynamic changes in polyclonal responses precise framework and complementarity determining region annotation of rearranging genes is pivotal. Most sequence annotation tools are designed primarily for use with human and mouse antibody sequences which use databases with fixed species lists, applying very specific assumptions which select against unique structural characteristics. For this reason, data agnostic tools able to learn from presented data can be very useful with new species or with novel datasets.ResultsWe have developed IgMAT, which utilises a reduced amino acid alphabet, that incorporates multiple HMM alignments into a single consensus to automatically annotate immunoglobulin sequences from most organisms. Additionally, the software allows the incorporation of user defined databases to better represent the species and/or antibody class of interest. To demonstrate the accuracy and utility of IgMAT, we present analysis of sequences extracted from structural data and immunoglobulin sequence datasets from several different species.ConclusionsIgMAT is fully open-sourced and freely available on GitHub (https://github.com/TPI-Immunogenetics/igmat) for download under GPLv3 license. It can be used as a CLI application or as a python module to be integrated in custom scripts.

Full Text