Abstract

Since the first human Gene Therapy clinical trial was initiated 25 years ago, many technological advances have dove-tailed into the field, moving personalized genetic treatments from concept towards routine reality. In particular, widespread adoption of next-generation sequencing technologies have enabled faster vector integration profiling to assay for potential genotoxities. Genomic data generation since 2007 has grown faster than Moore's law for computing power, now making data analysis the new bottleneck in the workflow. Newer and accelerated solutions are needed to manage exponential increases in data accumulation. Our goal is to compress sequence data generation and downstream data analysis within 5 work days, enabling the utility of genotoxicity assay to support time-sensitive clinical decision making. Time and cost savings in integrated and streamlined data processing should translate into easier and faster implementation in clinical settings. We use non-viral transfections, based on the Sleeping Beauty transposition system, to reprogram T cells for Adoptive Immunotherapy. We exploit the fact that Sleeping Beauty transposons invariably inserts into “TA” sites via a cut-and-paste mechanism driven by cognate transposases, to develop an analysis platform, SBhashmap, to quickly locate and unlock insights associated with the genomic integration sites. We chose hash tables over other table data structures for speed, an advantage which is more apparent when the number of entries is large. SBhashmap consists of pre-populated hash tables with over 150 million entries corresponding to the “TA” sites from the human genome and hash functions to match for quick site matching. By cataloging the TA sites ahead of time, we can cut the mapping time significantly. To benchmark SBhashmap, we compared transposon integration mapping to BLAST and BLAT matching for human genomic sequences, and found a 20-fold reduction in processing time, while maintaining similar levels of accuracy. This kind of hash maps is particularly efficient because the maximum number of entries is known in advance, such that the bucket array can be allocated once with the optimum size fixed (not resized, with no insertions and deletions necessary). With collision-free hash functions, the keys are not stored in the tables. With minor adaptations, SBhashmap can process genomic data from experimentally relevant model organisms such mice, worms, and fruitflies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.