Abstract

Tree-based Pipeline Optimization Tool (TPOT) is an automated machine learning (AutoML) system that recommends optimal pipeline for supervised learning problems by scanning data for novel features, selecting appropriate models and optimizing their parameters. However, like other AutoML systems, TPOT may reach computational resource limits when working on big data such as whole-genome expression data. We develop two novel features for TPOT, Feature Set Selector and Template, which leverage domain knowledge, greatly reduce the computational expense and flexibly extend TPOT's application to biomedical big data analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call