Abstract

The analysis of the genomic distribution of viral vector genomic integration sites is a key step in hematopoietic stem cell-based gene therapy applications, allowing to assess both the safety and the efficacy of the treatment and to study the basic aspects of hematopoiesis and stem cell biology. Identifying vector integration sites requires ad-hoc bioinformatics tools with stringent requirements in terms of computational efficiency, flexibility, and usability. We developed VISPA (Vector Integration Site Parallel Analysis), a pipeline for automated integration site identification and annotation based on a distributed environment with a simple Galaxy web interface. VISPA was successfully used for the bioinformatics analysis of the follow-up of two lentiviral vector-based hematopoietic stem-cell gene therapy clinical trials. Our pipeline provides a reliable and efficient tool to assess the safety and efficacy of integrating vectors in clinical settings.

Highlights

  • Viral vectors, due to their ability to permanently integrate in a target genome, are used to achieve the stable genetic modification of therapeutically relevant cells and their progeny

  • Reliability of VISPA and other tools for integration site (IS) analysis We assessed the reliability of our tool and other available software (Mavric [33], SeqMap [34] and QuickMap [35]), on an in silico dataset of 455 human sequences that simulate ISs with pre-determined genomic coordinates, characterized by different length, sequence complexity and mappability

  • IS analysis is an essential step for assessing the safety and efficacy of molecular therapies that use geneticallymodified hematopoietic stem cells via integrating viral

Read more

Summary

Background

Due to their ability to permanently integrate in a target genome, are used to achieve the stable genetic modification of therapeutically relevant cells and their progeny. File for each barcode); the LTR and LC sequences are subsequently removed from each read to isolate genomic fragments; in the step, reads are mapped to the reference genome and several filters are applied to ensure unambiguous alignment; after that, ISs that fall in the same 3 bp window are merged together; all ISs are annotated by listing nearby genomic features (for example, genes). Integration site merging Due to the possible presence of technical biases, we applied a previously validated [6,26,27] 3 bp tolerance window on the genomic position of the IS (that is, the starting point of the alignment): all reads in R that lie in the same window are merged into a single locus, represented by the first position in the window itself This is achieved by sorting reads by their starting position on each reference chromosome and running a sliding window [28] on the sorted list. While most pipeline tools are executed on single CPU cores assigned to them by the RM, the distributed alignment and filtering step runs concurrently on cluster subsections managed by Hadoop (Figure 3)

Results and discussion
Conclusions
Naldini L
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call