Abstract

Single-cell assay for transposase accessible chromatin using sequencing(scATAC-seq) is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. The similarity of data structure and feature between scRNA-seq and scATAC-seq makes it feasible to identify the cell types in scATAC-seq through traditional supervised machine learning methods. Here, we evaluated 6 popular machine learning methods for classification in scATAC-seq. The performance of the methods is evaluated using 4 public single cell ATAC-seq datasets of different tissues, sizes and technologies. We evaluated these methods using intradatasets experiments of 5-folds cross validation based on accuracy, recall and percentage of correctly predicted cells. We found that these methods may perform well in some types of cells in a single dataset, but the overall results are not as well as in scRNA-seq analysis. For testing the classification ability of machine learning methods across datasets, we applied inter-dataset experiments to test the performance of machine learning methods in realistic scenarios. SVM and NMC are overall the top 2 best-performing methods across all experiments. We recommend researchers to apply SVM and NMC as the underlying classifier when developing an automatic classification method in scATAC-seq.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call