Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Alexander Bagaev,Artur Baisangurov,Cagdas Tazearslan,Dawn Fernandez,Dmitry Kravchenko,Ekaterina Belova,Elena Vasileva,Ilya Cheremushkin,John Curran,Katerina Nuzhdina,Kelley Morgan,Kirill Shaposhnikov,Krystle Nomie,Kushal Suryamohan,Leznath Kaneunyenye,Madison Chasse,Maria Sorokina,Mary Abdou,Maxim Chelushkin,Nathan Fowler,Nikita Kotlov,Pavel Zemskiy,Svetlana Khorkova,Svetlana Podsvirova,Yaroslav Lozinsky

doi:10.1038/s42003-024-06020-z

Abstract

With the increased use of gene expression profiling for personalized oncology, optimized RNA sequencing (RNA-seq) protocols and algorithms are necessary to provide comparable expression measurements between exome capture (EC)-based and poly-A RNA-seq. Here, we developed and optimized an EC-based protocol for processing formalin-fixed, paraffin-embedded samples and a machine-learning algorithm, Procrustes, to overcome batch effects across RNA-seq data obtained using different sample preparation protocols like EC-based or poly-A RNA-seq protocols. Applying Procrustes to samples processed using EC and poly-A RNA-seq protocols showed the expression of 61% of genes (N = 20,062) to correlate across both protocols (concordance correlation coefficient > 0.8, versus 26% before transformation by Procrustes), including 84% of cancer-specific and cancer microenvironment-related genes (versus 36% before applying Procrustes; N = 1,438). Benchmarking analyses also showed Procrustes to outperform other batch correction methods. Finally, we showed that Procrustes can project RNA-seq data for a single sample to a larger cohort of RNA-seq data. Future application of Procrustes will enable direct gene expression analysis for single tumor samples to support gene expression-based treatment decisions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Communications Biology	Publication Date: Mar 30, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Abstract

Talk to us

Similar Papers

More From: Communications Biology

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Abstract

Talk to us

Similar Papers

More From: Communications Biology