Distributed Clustering Approach by Apache Pyspark Based on SEER for Clinical Data

R Ramesh,M V Judy

doi:10.1142/s0218001422400067

Abstract

Data clustering is a thoroughly studied data mining issue. As the amount of information being analyzed grows exponentially, there are several problems with clustering diagnostic large datasets like the monitoring, microbiology, and end results (SEER) carcinoma feature sets. These traditional clustering methods are severely constrained in terms of speed, productivity, and adaptability. This paper summarizes the most modern distributed clustering algorithms, organized according to the computing platforms used to process vast volumes of data. The purpose of this work was to offer an optimized distributed clustering strategy for reducing the algorithm’s total execution time. We obtained, preprocessed, and analyzed clinical SEER data on liver cancer, respiratory cancer, human immunodeficiency virus (HIV)-related lymphoma, and lung cancer for large-scale data clustering analysis. Three major contributions and their effects were covered in this paper: To begin, three current Pyspark distributed clustering algorithms were evaluated on SEER clinical data using a simulated New York cancer dataset. Second, systemic inflammatory response syndrome (SIRS) model inference was done and described using three SEER cancer datasets. Third, employing lung cancer data, we suggested an optimized distributed bisecting [Formula: see text]-means method. We have shown the outcomes of our suggested optimized distributed clustering technique, demonstrating the performance enhancement.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Clustering Approach by Apache Pyspark Based on SEER for Clinical Data

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence

Lead the way for us

Similar Papers

Abstract W P329: Systemic Inflammatory Response Syndrome during Hospitalization for Intracerebral Hemorrhage Drives Poor Functional Outcome at Discharge
Amelia K Boehme ... Alyssa Gadpaille
Stroke | VOL. 46
Amelia K Boehme, et. al.Amelia K Boehme ... Alyssa Gadpaille
01 Feb 2015
Stroke | VOL. 46

Characteristics of leptospirosis with systemic inflammatory response syndrome: a multicenter study.
Hava Yilmaz ... Kadriye Kart Yasar
Annals of Clinical Microbiology and Antimicrobials | VOL. 14
Hava Yilmaz, et. al.Hava Yilmaz ... Kadriye Kart Yasar
01 Dec 2015
Annals of Clinical Microbiology and Antimicrobials | VOL. 14

Systemic inflammatory response syndrome in patients with severe fever with thrombocytopenia syndrome: prevalence, characteristics, and impact on prognosis
Zhongwei Zhang ... Yong Xiong
BMC infectious diseases | VOL. 24
Zhongwei Zhang, et. al.Zhongwei Zhang ... Yong Xiong
30 Jan 2024
BMC infectious diseases | VOL. 24

High expression of CD3+T-lymphocytes in cerebrospinal fluid increases the risk of critical cerebral hemorrhage with systemic inflammatory response syndrome (SIRS) after surgery
Chunying Zhu ... Yongmei Hao
Clinica Chimica Acta | VOL. 565
Chunying Zhu, et. al.Chunying Zhu ... Yongmei Hao
12 Oct 2024
Clinica Chimica Acta | VOL. 565

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Clustering Approach by Apache Pyspark Based on SEER for Clinical Data

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence