Abstract

Technical or biologically irrelevant differences caused by different experiments, times, or sequencing platforms can generate batch effects that mask the true biological information. Therefore, batch effects are typically removed when analyzing single-cell RNA sequencing (scRNA-seq) datasets for downstream tasks. Existing batch correction methods usually mitigate batch effects by reducing the data from different batches to a lower dimensional space before clustering, potentially leading to the loss of rare cell types. To address this problem, we introduce a novel single-cell data batch effect correction model using Biological-noise Decoupling Autoencoder (BDA) and Central-cross Loss termed BDACL. The model initially reconstructs raw data using an auto-encoder and conducts preliminary clustering. We then construct a similarity matrix and a hierarchical clustering tree to delineate relationships within and between different batches. Finally, we introduce a Central-cross Loss (CL). This loss leverages cross-entropy loss to prompt the model to better distinguish between different cluster labels. Additionally, it employs the Central Loss to encourage samples to form more compact clusters in the embedding space, thereby enhancing the consistency and interpretability of clustering results to mitigate differences between different batches. The primary innovation of this model lies in reconstructing data with an auto-encoder and gradually merging smaller clusters into larger ones using a hierarchical clustering tree. By using reallocated cluster labels as training labels and employing the Central-cross Loss, the model effectively eliminates batch effects in an unsupervised manner. Compared to current methods, BDACL can mitigate batch effects without losing rare cell types.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.