Lifelong Machine Learning and root cause analysis for large-scale cancer patient data

Gautam Pal,Zhuo Wang,Katie Atkinson,Gangmin Li,Hongyi Wu,Xianbin Hong

doi:10.1186/s40537-019-0261-9

Abstract

IntroductionThis paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach. In many big data systems, iterative re-training high dimensional data from scratch is computationally infeasible since constant data stream ingestion on top of a historical data pool increases the training time exponentially. Therefore, the need arises on how to retain past learning and fast update the model incrementally based on the new data. Also, the current machine learning approaches do the model prediction without providing a comprehensive root cause analysis. To resolve these limitations, our framework lays foundations on an ensemble process between stream data with historical batch data for an incremental lifelong learning (LML) model.Case descriptionA cancer patient’s pathological tests like blood, DNA, urine or tissue analysis provide a unique signature based on the DNA combinations. Our analysis allows personalized and targeted medications and achieves a therapeutic response. Model is evaluated through data from The National Cancer Institute’s Genomic Data Commons unified data repository. The aim is to prescribe personalized medicine based on the thousands of genotype and phenotype parameters for each patient.Discussion and evaluationThe model uses a dimension reduction method to reduce training time at an online sliding window setting. We identify the Gleason score as a determining factor for cancer possibility and substantiate our claim through Lilliefors and Kolmogorov–Smirnov test. We present clustering and Random Decision Forest results. The model’s prediction accuracy is compared with standard machine learning algorithms for numeric and categorical fields.ConclusionWe propose an ensemble framework of stream and batch data for incremental lifelong learning. The framework successively applies first streaming clustering technique and then Random Decision Forest Regressor/Classifier to isolate anomalous patient data and provides reasoning through root cause analysis by feature correlations with an aim to improve the overall survival rate. While the stream clustering technique creates groups of patient profiles, RDF further drills down into each group for comparison and reasoning for useful actionable insights. The proposed MALA architecture retains the past learned knowledge and transfer to future learning and iteratively becomes more knowledgeable over time.

Highlights

This paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach
We propose an ensemble framework of stream and batch data for incremental lifelong learning
We present a unique dimension reduction method of feature vectors enabling quick re-training for clustering and Random Decision Forest through the multi-agent Lambda Architecture (MALA) architecture

Summary

Introduction

This paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach. The need arises on how to retain past learning and fast update the model incrementally based on the new data. The current machine learning approaches do the model prediction without providing a comprehensive root cause analysis. To resolve these limitations, our framework lays foundations on an ensemble process between stream data with historical batch data for an incremental lifelong learning (LML) model. Through a configurable mini-batch of the time window of few hours, the model gets re-trained and at the same time, predicts continually on test data. Additional static data can be merged to stored HDFS data by a stream processor and the mini-batch can pick and continue from thereon

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Big Data	Publication Date: Dec 1, 2019
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Lifelong Machine Learning and root cause analysis for large-scale cancer patient data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Utilizing Priming to Identify Optimal Class Ordering to Alleviate Catastrophic Forgetting
Gabriel Mantione-Holmes ... Jugal Kalita
-
Gabriel Mantione-Holmes, et. al.Gabriel Mantione-Holmes ... Jugal Kalita
01 Feb 2023
01 Feb 2023

Lifelong Machine Learning
Zhiyuan Chen ... Bing Liu
-
Zhiyuan Chen, et. al.Zhiyuan Chen ... Bing Liu
01 Jan 2018
01 Jan 2018

Exploration of machine algorithms based on deep learning model and feature extraction.
Yufeng Qian
Mathematical Biosciences and Engineering | VOL. 18
Yufeng QianYufeng Qian
01 Jan 2020
Mathematical Biosciences and Engineering | VOL. 18

Semi-Unsupervised Lifelong Learning for Sentiment Classification
Xianbin Hong ... Prudence Wong
-
Xianbin Hong, et. al.Xianbin Hong ... Prudence Wong
22 Jun 2019
22 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lifelong Machine Learning and root cause analysis for large-scale cancer patient data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data