Abstract

Single-cell data are sparse and have coverage fluctuations, making it difficult, in comparison with data obtained from next-generation sequencing (NGS), to call single nucleotide variants (SNVs) and indels. Furthermore, most existing sequencing methods are unable to effectively call whole-genome SNVs and indels from single cell sequencing (SCS) data. In this study, we propose a new method for the efficient identification of SNVs and indels from SCS data, called scSNVIndel. scSNVIndel uses bidirectional long short-term memory (Bi-LSTM) as its base and integrates new natural language processing (NLP) technology. It automatically extracts features and accurately calls SNVs and indels when using SCS data, which is characterized by uneven and discontinuous coverage. Moreover, scSNVIndel can call variants from the sequence directly, retaining valuable information from the SCS data, as it does not convert the sequence into an image like the DeepVariant method. The results show that scSNVIndel performs better in terms of accuracy and recall for calling variants, when compared with other existing methods. scSNVIndel is currently an open-source method, available at https://github.com/CSuperlei/scSNVIndel, and its usage methods are published on the following website: https://www.aiguqu.com/2020/06/18/scSNVIndel/.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call