Abstract
Clustered regularly interspaced short palindromic repeats and associated proteins (CRISPR-Cas) allows more specific and efficient gene editing than all previous genetic engineering systems. These exciting discoveries stem from the finding of the CRISPR system being an adaptive immune system that protects the prokaryotes against exogenous genetic elements such as phages. Despite the exciting discoveries, almost all knowledge about CRISPRs is based only on microorganisms that can be isolated, cultured and sequenced in labs. However, about 95% of bacterial species cannot be cultured in labs. The fast accumulation of metagenomic data, which contains DNA sequences of microbial species from natural samples, provides a unique opportunity for CRISPR annotation in uncultivable microbial species. However, the large amount of data, heterogeneous coverage and shared leader sequences of some CRISPRs pose challenges for identifying CRISPRs efficiently in metagenomic data. In this study, we developed a CRISPR finding tool for metagenomic data without relying on generic assembly, which is error-prone and computationally expensive for complex data. Our tool can run on commonly available machines in small labs. It employs properties of CRISPRs to decompose generic assembly into local assembly. We tested it on both mock and real metagenomic data and benchmarked the performance with state-of-the-art tools. The source code and the documentation of metaCRISPR is available at https://github.com/hangelwen/metaCRISPR CONTACT: yannisun@msu.edu.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have