Analysis of Meta-Learning Approaches for TCGA Pan-cancer Datasets

Jingyuan Chou,Mengdi Huai,Chongzhi Zang,Stefan Bekiranov,Aidong Zhang

doi:10.1109/bibm49941.2020.9313397

Abstract

Cancer has been characterized as a heterogeneous disease, and the classification of cancer subtypes has become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients and provide clinical decision support for clinicians. With the advance of machine learning in the last decade, many researchers employ machine learning to tackle the cancer classification problem. Importantly, traditional machine learning algorithms require a large amount of annotated data for model training. However, collection of large amounts of annotated data is time-consuming and expensive and may not be realistic in real-world activities. Facing data scarcity, metalearning is proposed to tackle this problem. Meta-learning utilizes prior knowledge learned from related tasks and generalizes to new tasks of limited supervised experience, and it has been applied in many fields to tackle scarce annotated data problem, such as few-shot image classification, drug discovery, etc. As data scarcity is common in cancer research and diagnosis studies, and there are only few previous studies that classify cancers based on limited annotated data. We explore the meta-learning algorithm (MAML) to tackle the scenario where only limited annotated data are available. In this work, our objective is to comprehensively compare MAML among few-shot learning methods (matching network and prototypical network) and traditional machine learning methods (random forest and Knearest neighbor). Experimental results on The Cancer Genome Atlas (TCGA) cancer patient data demonstrates the effectiveness and superiority of MAML over other methods, including its ability to outperform the other methods using 4.5-fold fewer features.

Full Text