Lung Cancer Classification Using Random Oversampling and Gradient Boosted Decision Tree

Wahyudi Setiawan Wahyudi Setiawan,Mulaab Mulaab,Yoga Dwitya Pramudita Yoga Dwitya Pramudita

doi:10.47577/technium.v16i.9997

Wahyudi Setiawan Wahyudi Setiawan, Mulaab Mulaab + Show 1 more

Open Access

https://doi.org/10.47577/technium.v16i.9997

Copy DOI

Abstract

Lung cancer has the highest number of sufferers in men, especially in Indonesia. An unhealthy lifestyle, smoking, and pollution also aggravate the patient's condition. In this study, a diagnosis was made of patients with suspected lung cancer. For an experiment, the data from public datasets, “Cancer Patient," “Survey Lung Cancer,” and “Cancer_Data.” The research phase includes exploratory data analysis (EDA), pre-processing, and classification. EDA aims to know data types, missing values, correlations between attributes, and outliers. Pre-processing consists of data cleaning and data discretization. In the next process, we use randomized oversampling to overcome imbalanced data. The final step was classification using Gradient Boosted Decision Tree (GBDT). The experiment scenario uses imbalanced and balanced data. For the testing scenario, the variation in learning rate and the number of trees were used with Randomized Search Tuning. The distribution of training and testing data uses 5-fold cross-validation. The result shows that using balanced data between classes is better than imbalanced data. In addition, we also classify the dataset with the k-nearest neighbor and support vector machine. The GBDT produces better performance for two datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Lung Cancer Classification Using Random Oversampling and Gradient Boosted Decision Tree

Abstract

Talk to us

Similar Papers

More From: Technium: Romanian Journal of Applied Sciences and Technology

Lead the way for us

Journal: Technium: Romanian Journal of Applied Sciences and Technology	Publication Date: Oct 29, 2023
License type: CC BY 4.0

Similar Papers

An examination on the effect of CVNN parameters while classifying the real-valued balanced and unbalanced data
Yunus Emre Acar ... Murat Ceylan
-
Yunus Emre Acar, et. al.Yunus Emre Acar ... Murat Ceylan
01 Sep 2018
01 Sep 2018

Rapid measurement of classification levels of primary macronutrients in durian (Durio zibethinus Murray CV. Mon Thong) leaves using FT-NIR spectrometer and comparing the effect of imbalanced and balanced data for modelling
Thitima Phanomsophon ... Panmanas Sirisomboon
Measurement | VOL. 203
Thitima Phanomsophon, et. al.Thitima Phanomsophon ... Panmanas Sirisomboon
23 Sep 2022
Measurement | VOL. 203

Dental implants success prediction by classifier ensemble on imbalanced data
Mostafa Sabzekar ... Vahide Babaiyan
Computer Methods and Programs in Biomedicine Update | VOL. 1
Mostafa Sabzekar, et. al.Mostafa Sabzekar ... Vahide Babaiyan
01 Jan 2020
Computer Methods and Programs in Biomedicine Update | VOL. 1

DATA IMBALANCE IN LANDSLIDE SUSCEPTIBILITY ZONATION: UNDER-SAMPLING FOR CLASS-IMBALANCE LEARNING
S K Gupta ... D P Shukla
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLII-3/W11
S K Gupta, et. al.S K Gupta ... D P Shukla
14 Feb 2020
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLII-3/W11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lung Cancer Classification Using Random Oversampling and Gradient Boosted Decision Tree

Abstract

Talk to us

Similar Papers

More From: Technium: Romanian Journal of Applied Sciences and Technology