Abstract
Single-cell data are sparse and have coverage fluctuations, making it difficult, in comparison with data obtained from next-generation sequencing (NGS), to call single nucleotide variants (SNVs) and indels. Furthermore, most existing sequencing methods are unable to effectively call whole-genome SNVs and indels from single cell sequencing (SCS) data. In this study, we propose a new method for the efficient identification of SNVs and indels from SCS data, called scSNVIndel. scSNVIndel uses bidirectional long short-term memory (Bi-LSTM) as its base and integrates new natural language processing (NLP) technology. It automatically extracts features and accurately calls SNVs and indels when using SCS data, which is characterized by uneven and discontinuous coverage. Moreover, scSNVIndel can call variants from the sequence directly, retaining valuable information from the SCS data, as it does not convert the sequence into an image like the DeepVariant method. The results show that scSNVIndel performs better in terms of accuracy and recall for calling variants, when compared with other existing methods. scSNVIndel is currently an open-source method, available at https://github.com/CSuperlei/scSNVIndel, and its usage methods are published on the following website: https://www.aiguqu.com/2020/06/18/scSNVIndel/.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.