This chapter describes the computational pipeline for the processing and visualization of Protec-Seq data, a method for purification and genome-wide mapping of double-stranded DNA protected by a specific protein at both ends. In the published case, the protein of choice was Saccharomyces cerevisiae Spo11, a conserved topoisomerase-like enzyme that makes meiotic double-strand breaks (DSBs) to initiate homologous recombination, ensuring proper segregation of homologous chromosomes and fertility. The isolated DNA molecules were thus termed double DSB (dDSB) fragments and were found to represent 34 to several hundred base-pair long segments that are generated by Spo11 and are enriched at DSB hotspots, which are sites of topological stress. In order to allow quantitative comparisons between dDSB profiles across experiments, we implemented calibrated chromatin immunoprecipitation sequencing (ChIP-Seq) using the meiosis-competent yeast species Saccharomyces kudriavzevii as calibration strain. Here, we provide a detailed description of the computational methods for processing, analyzing, and visualizing Protec-Seq data, comprising the download of the raw data, the calibrated genome-wide alignments, and the scripted creation of either arc plots or Hi-C-style heatmaps for the illustration of chromosomal regions of interest. The workflow is based on Linux shell scripts (including wrappers for publicly available, open-source software) as well as R scripts and is highly customizable through its modular structure.
Read full abstract