Large scale biomedical data analysis with tree-based automated machine learning

Trang T Le,Jason H Moore,Weixuan Fu

doi:10.1145/3377929.3397770

Large scale biomedical data analysis with tree-based automated machine learning

Trang T Le, Jason H Moore + Show 1 more

Open Access

https://doi.org/10.1145/3377929.3397770

Copy DOI

Publication Date: Jul 8, 2020

Affiliation: University of Pennsylvania

#Tree-based Pipeline Optimization Tool #Biomedical Data Analysis + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Tree-based Pipeline Optimization Tool (TPOT) is an automated machine learning (AutoML) system that recommends optimal pipeline for supervised learning problems by scanning data for novel features, selecting appropriate models and optimizing their parameters. However, like other AutoML systems, TPOT may reach computational resource limits when working on big data such as whole-genome expression data. We develop two novel features for TPOT, Feature Set Selector and Template, which leverage domain knowledge, greatly reduce the computational expense and flexibly extend TPOT's application to biomedical big data analysis.

Full Text