Abstract

Abstract The T-cell receptor (TCR) population in humans is comprised of highly diversified heterodimers, regulating the recognition of antigen-major histocompatibility complex. Tremendous TCR sequence diversity is produced by somatic recombination of several TCR gene loci each consisting of multiple gene segments. Next-generation sequencing has enabled comprehensive profiling of the TCR repertoire from different physiological and disease conditions ushering in much interest in using TCR-seq to assess T-cell diversity. However, during NGS library construction and sequencing, errors and enzymatic inefficiencies can compromise the accuracy of the final data, particularly around calling of the VD and VDJ recombined regions and subsequent clonotype assignment. To increase the accuracy of NGS sequencing, Unique Molecular Identifiers (UMIs), consisting of short random nucleotide bases, can be used to mark original molecules in NGS library allowing for error and bias corrections. There are two well studied technical limitations to apply UMIs: 1.) UMI sequences tend to collide when input molecule number is large 2.) UMI sequences are not insulated from PCR and sequencing errors. To address these limitations, many computational approaches had been published. Among them, very few can be used to solve UMI colliding errors and over-simplified error models were implemented for UMI sequencing error handling. Here we report a novel strategy and UMI structure which uses more complex UMIs that is longer and of different length. This results in minimizing UMI collision while maximizing sequencing quality. Our UMI analysis pipeline, “UMI-nea” is able to handle not only substitution errors but also indel errors and UMIs with different lengths. We developed a novel computational framework to parallelly process sequence comparisons to mitigate the elevated computational burden. To account for the varied dispersion of PCR efficiency for different molecules and error bearing UMIs from libraries with different input and with different sequencing depth, we also developed a statistical framework leveraging negative binomial model and single-cell knee plot to set a dynamic threshold for original molecule estimate. We verified UMI-nea with several simulated data and demonstrated that UMI-nea can achieve >99% completeness and homogeneity to recover the original molecule count with various error rates and UMI lengths, outperforming existing tools and methods in comparison. We applied UMI-nea to profile TCR for 8 PBMC samples sequenced on different Illumina platforms with different sequencing depths. We observed >85% reproducibility of clonotype calls on all samples. To test the sensitivity and specificity of UMI-nea, we sequenced pure cell line samples and cell line spike-in samples with different ratios and discovered very high recall and precision rates. Citation Format: Jixin Deng, Jingxiao Zhang, Song Tian, Samuel J. Rulli, Hong Xu, John DiCarlo, Eric Lader. UMI-nea: A fast and robust UMI analysis approach to accurately identify and quantify TCR repertoire from targeted RNA sequencing with wide range of input molecules [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7425.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call