Abstract

Background: Researchers working in genome engineering are making fast strides to precising the techniques of site-specific gene editing. CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats) is one of the most recent gene editing techniques. It consists of a Cas9 nuclease and a single guided RNA (sgRNA) that targets DNA at the required target site. Consequently, along with on-targets, genomes may also have multiple off-targets, which is a potential drawback of gene editing techniques. Lab-based assays are used to examine the off-target effects of sgRNA. This challenge makes the technique questionable in terms of cost, time and efficacy. Deep learning techniques have been used efficiently to analyze biological data and calculate off-target sites in the genome. This research aims to identify CRISPR off-targets within the genome as well as predict genome vulnerability for unexpected mutations using a deep learning approach. Method: This study presents a two-step data preprocessing and off-target prediction method that may be used to determine genome instability. A raw DNA sequence was initially preprocessed and subdivided into 20-bp long multiple substrings. These substrings were further categorized into perfect matches and mismatches. Secondly, average weights were assigned based on the CFD (Cutting Frequency Determination) of the off-targets that showed the genome's vulnerability. Finally, the deep neural network model was trained using random genomic data. To identify off-targets and predict genome vulnerability, a Deep-RPA model was trained and tested on the DNA sequence of the Arabidopsis model plant. Results: The results demonstrated the accuracy and validity of Deep-RPA by accurately predicting off-targets and genome vulnerabilities on different sample sequences. The proposed model was evaluated through stratified five-fold cross-validation, which attained an average accuracy of 99 percent for predicting genome vulnerability. The results illustrated satisfactory improvement compared to existing solutions. Conclusion: The proposed design using a 1D CNN has shown significant improvement in text-based sequence analysis and has outperformed current methods by a wide margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call