Abstract 2093: Introducing InMoose, an integrated open source Python package for multi-omic analyses

Maximilien Colange,Guillaume Appé,Abdelkader Behdenna,Akpéli V Nordor

doi:10.1158/1538-7445.am2023-2093

Abstract

Abstract The recent exponential progress of sequencing technologies has dramatically impacted cancer research and paved the way to precision medicine in cancer care. In parallel, light-speed progress in bioinformatics has been essential to allow analysts to embrace the vast amount of data yielded by high-throughput profiling machines, turn this data into cancer biology knowledge, and ultimately develop innovative approaches to cancer care. Still, computational complexity and tools' interoperability remain major challenges for the advancement of -omic data-driven cancer research. Despite the historical prevalence of R, Python is gaining momentum in the bioinformatics landscape. As a general purpose language, it offers numerous advantages: - its overall ecosystem facilitates the integration of bioinformatics tools into large-scale frameworks, increasing their versatility and widening their targeted audience; - its straightforward syntax and user-friendly logic makes it a popular language in education, and explains its wide adoption in various sectors; - its wide adoption favors interdisciplinarity, e.g. it can reduce the learning curve for engineers eager to make an impact in cancer research. These advantages motivate the further development of the Python bioinformatics ecosystem. We thus advocate the necessity to port reference tools from R to Python, with the ambition of shaping a comprehensive ecosystem. Sign of this trend, state-of-the-art tools have been directly developed in Python (e.g. lifelines, a library dedicated to survival analysis) or quickly ported in Python from R (e.g. harmony, an R library for integrating single cell data). We introduce InMoose (Integrated Multi-Omics Open Source Environment), an open source Python unified framework for every -omic data type. It is based on recognized tools and focuses on efficiency and user-friendliness. InMoose is accessible at https://github.com/epigenelabs/inmoose and is released under GPL3 license. The first version of InMoose focuses on bringing to the Python world transcriptomics tools mostly based on the edgeR R package. It features batch-effect correction algorithms, as well as differential expression analysis, for microarray and RNA-seq data. InMoose demonstrates the advantages of our approach: - it is developed with the intent to capitalize on and facilitate interactions with other popular Python data-oriented libraries (e.g. pandas, numpy); - it demonstrates significant computational performance improvements; - our package integrates easily into web-based user-friendly analyses platforms (e.g. Epigene Labs’ proprietary mCUBE platform); - porting existing code from one language to another provides an opportunity window to improve both functionality and performance. We expect that our effort will help foster a larger collaborative effort to build and grow a consistent state-of-the-art Python platform for cancer bioinformatics. Citation Format: Maximilien Colange, Guillaume Appé, Akpéli Nordor, Abdelkader Behdenna. Introducing InMoose, an integrated open source Python package for multi-omic analyses [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2093.

Full Text