SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis

Khawla Tadist,Azeddine Zahi,Said Najah,Nikola S Nikolov,Fatiha Mrabti

doi:10.1186/s40537-021-00409-x

Khawla Tadist, Azeddine Zahi + Show 3 more

Open Access

https://doi.org/10.1186/s40537-021-00409-x

Copy DOI

Journal: Journal of Big Data	Publication Date: Jan 13, 2021
Citations: 11	License type: open-access

Affiliation: Sidi Mohamed Ben Abdellah University

Abstract

The Dimensionality Curse is one of the most critical issues that are hindering faster evolution in several fields broadly, and in bioinformatics distinctively. To counter this curse, a conglomerate solution is needed. Among the renowned techniques that proved efficacy, the scaling-based dimensionality reduction techniques are the most prevalent. To insure improved performance and productivity, horizontal scaling functions are combined with Particle Swarm Optimization (PSO) based computational techniques. Optimization algorithms are an interesting substitute to traditional feature selection methods that are both efficient and relatively easier to scale. Particle Swarm Optimization (PSO) is an iterative search algorithm that has proved to achieve excellent results for feature selection problems. In this paper, a composite Spark Distributed approach to feature selection that combines an integrative feature selection algorithm using Binary Particle Swarm Optimization (BPSO) with Particle Swarm Optimization (PSO) algorithm for cancer prognosis is proposed; hence Spark Distributed Particle Swarm Optimization (SDPSO) approach. The effectiveness of the proposed approach is demonstrated using five benchmark genomic datasets as well as a comparative study with four state of the art methods. Compared with the four methods, the proposed approach yields the best in average of purity ranging from 0.78 to 0.97 and F-measure ranging from 0.75 to 0.96.

Highlights

Deep Sequencing is the process of Deoxyribonucleic acid (DNA) fractioning, which dramatically transformed the genomic research field
The Spark Distributed Particle Swarm Optimization (SDPSO) approach provides an average purity and F-measure scores that are significantly higher than four state of the art methods, namely, k-means, Genetic Algorithm (GA), the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and the hybrid Particle Swarm Optimization (PSO)-GA [19]
Experimental design In order to test the performance of the SDPSO approach and its capacity to process highly dimensional datasets in low computational runtime, the experimentation is initiated using a non-distributed architecture, followed by a distributed one using PySpark on Spark 2.4

Summary

Introduction

Deep Sequencing is the process of DNA fractioning, which dramatically transformed the genomic research field. The advancement that this process witnessed during the last decade has led to the continuous generation of immense amounts of data, putting the genomic field among the top big data generating fields [1]. Cancer prognosis is Tadist et al J Big Data (2021) 8:19 very complicated due to the nature of the genomic datasets that contain thousands of features but relatively fewer samples. Traditional machine learning techniques fall short in this area since they are used to dealing with datasets that have few features and multiple samples [3] leading to the necessity of novel technologies, big data analytical techniques

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

A novel hybrid BPSO\u2013SCA approach for feature selection
Lalit Kumar ... Kusum Kumari Bharti
Natural Computing | VOL. 20
Lalit Kumar, et. al.Lalit Kumar ... Kusum Kumari Bharti
23 Oct 2019
A novel hybrid BPSO\u2013SCA approach for feature selection
Lalit Kumar ... Kusum Kumari Bharti

A Novel Probability Binary Particle Swarm Optimization Algorithm and Its Application
Ling Wang ... Jingqi Fu
Journal of Software | VOL. 3
Ling Wang, et. al.Ling Wang ... Jingqi Fu
12 Jan 2008
Journal of Software | VOL. 3

Comparison of Binary Particle Swarm Optimization And Binary Dragonfly Algorithm for Choosing the Feature Selection
Andi Nugroho ... Sani Muhamad Isa
-
Andi Nugroho, et. al.Andi Nugroho ... Sani Muhamad Isa
24 Nov 2021
24 Nov 2021

Coordinated controller tuning of a boiler turbine unit with new binary particle swarm optimization algorithm
Muhammad Ilyas Menhas ... Ling Wang
International Journal of Automation and Computing | VOL. 8
Muhammad Ilyas Menhas, et. al.Muhammad Ilyas Menhas ... Ling Wang
01 May 2011
International Journal of Automation and Computing | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data