Sanger sequencing remains widely used in various experimental contexts, often in combination with flow cytometry for indexing specific cell populations. However, existing software lacks the capability to automate quality control (QC) of raw Sanger sequencing data and integrate it with flow cytometry information on a large scale. Here, we introduce scifer, an R package now available in the latest release of Bioconductor (3.20) showcasing its effectiveness in seamlessly integrating these types of data as demonstrated by analyses of B cell and T cell receptor sequences. Scifer preprocesses raw data from index sorts and immune receptor Sanger sequencing. It identifies high-quality sequences based on selected parameters, such as length, Phred scores, and heavy-chain complementarity-determining region 3 (HCDR3) quality. As a result, the quality of germline assignments is significantly increased and spurious variable gene mutations are reduced. Scifer is automated and can process thousands of sequences in less than an hour. Its output provides quality control reports, FASTA files, summarized tables, and electropherograms for manual inspection. In summary, scifer is a user-friendly software that speeds up the analysis of immune receptor repertoire sequences, offering wide applicability.
Read full abstract