Abstract

As the third generation gene editing technology, Crispr/Cas9 has a wide range of applications. The success of Crispr depends on the editing of the target gene via a functional complex of sgRNA and Cas9 proteins. Therefore, highly specific and high on-target cleavage efficiency sgRNA can make this process more accurate and efficient. Although there are already many sophisticated machine learning or deep learning models to predict the on-target cleavage efficiency of sgRNA, prediction accuracy remains to be improved. XGBoost is good at classification as the ensemble model could overcome the deficiency of a single classifier to classify, and we would like to improve the prediction efficiency for sgRNA on-target activity by introducing XGBoost into the model. We present a novel machine learning framework which combines a convolutional neural network (CNN) and XGBoost to predict sgRNA on-target knockout efficacy. Our framework, called CNN-XG, is mainly composed of two parts: a feature extractor CNN is used to automatically extract features from sequences and predictor XGBoost is applied to predict features extracted after convolution. Experiments on commonly used datasets show that CNN-XG performed significantly better than other existing frameworks in the predicted classification mode.

Highlights

  • Academic Editor: VladimirThe Crispr/Cas9 system is derived from the process by which phages infect bacteria.Crispr represents the sequence of short replies that are regularly spaced in clusters, with approximately the same length and specificity [1]

  • The single guide RNA (sgRNA) sequence and epigenetic sequence are converted into two 4 × 23 binary matrices via one-hot encoding, and the encoded sgRNA and epigenetic sequence are fed into the convolutional neural network (CNN) and RF for feature extraction, and XG-Boost is trained based on the extracted characteristics

  • CNN-XG gets the best performance in both area under the receiver operating characteristic curve (AUROC) values and spearman coefficients. These results indicate that CNN-XG is more predictive than CNN or XGBoost working alone for sgRNA on-target activity, further confirming the feasibility and effectiveness of the combination of CNN and XGBoost, showing the superiority of the hybrid model

Read more

Summary

Introduction

Academic Editor: VladimirThe Crispr/Cas system is derived from the process by which phages infect bacteria.Crispr (clustered regularly interspaced short palindromic repeat) represents the sequence of short replies that are regularly spaced in clusters, with approximately the same length and specificity [1]. The Crispr/Cas system is derived from the process by which phages infect bacteria. Crispr is a common immune system in bacteria used to fight viruses or exogenous DNA. The Crispr/Cas system directs the corresponding single guide RNA (sgRNA) recognition, positioning, fighting and cutting of target fragments of viral DNA by integrating invasive DNA fragments into the interval DNA. The recognition process is target recognition by the principle of complementary base pairing at a double-stranded target position of DNA with protospacer adjacent motif (PAM motif) [2,3]. Crispr/Cas relies on the principle of complementary base pairing for specific recognition, the Cas nucleases are tolerant to base matching between sgRNA and target DNA sequences. Effective evaluation of off-target and accurate prediction of on-target knockout efficacy of sgRNA has become the focus of Crispr/Cas system research

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.