Spark Machine Learning Research Articles

Monitoring and collecting medical data using embedded medical diagnostic devices with multiple sensors and sending these actual measured data to the corresponding health monitoring centers using multipurpose wireless networks to take necessary measures to coordinate with family medical service centers and regional medical service departments is a popular medical big data architecture. However, healthcare big data is characterized by large data volume, fast growth, multimodality, high value and privacy, etc. How to organize and manage it in a unified and efficient way is an important research direction at present. In response to the problems of low balance and poor security in the storage of data collected by distributed sensor networks in healthcare systems, we propose a distributed storage algorithm for big data in healthcare systems. The platform adopts Hadoop distributed file system and distributed file storage framework as the healthcare big data storage solution, and implements data integration, multidimensional data query and analysis mining components based on Spark‐SQL data query tool, Spark machine learning algorithm library and its mining and analysis pipeline development, respectively. The distributed storage model of big data and three data storage levels are constructed using cloud storage architecture, and the data storage intensity as well as levels are calculated by high data access in the upper level, data connection in the middle level, and data archiving in the lower level according to the set known data granularity, odds, and elasticity to realize big data storage. It is experimentally verified that the above algorithm has high distribution balance and low load balance in the storage process.

Read full abstract

Artificial intelligence, specifically machine learning, has been applied in a variety of methods by the research group to transform several data sources into valuable facts and understanding, allowing for superior pattern identification skills. Machine learning algorithms on huge and complicated data sets, computationally expensive on the other hand, processing requires hardware and logical resources, such as space, CPU, and memory. As the amount of data created daily reaches quintillion bytes, A complex big data infrastructure becomes more and more relevant. Apache Spark Machine learning library (ML-lib) is a famous platform used for big data analysis, it includes several useful features for machine learning applications, involving regression, classification, and dimension reduction, as well as clustering and features extraction. In this contribution, we consider Apache Spark ML-lib as a computationally independent machine learning library, which is open-source, distributed, scalable, and platform. We have evaluated and compared several ML algorithms to analyze the platform’s qualities, compared Apache Spark ML-lib against Rapid Miner and Sklearn, which are two additional Big data and machine learning processing platforms. Logistic Classifier (LC), Decision Tree Classifier (DTc), Random Forest Classifier (RFC), and Gradient Boosted Tree Classifier (GBTC) are four machine learning algorithms that are compared across platforms. In addition, we have tested general regression methods such as Linear Regressor (LR), Decision Tree Regressor (DTR), Random Forest Regressor (RFR), and Gradient Boosted Tree Regressor (GBTR) on SUSY and Higgs datasets. Moreover, We have evaluated the unsupervised learning methods like K-means and Gaussian Mixer Models on the data set SUSY and Hepmass to determine the robustness of PySpark, in comparison with the classification and regression models. We used ”SUSY,” ”HIGGS,” ”BANK,” and ”HEPMASS” dataset from the UCI data repository. We also talk about recent developments in the research into Big Data machines and provide future research directions.

Read full abstract

Spark Machine Learning Research Articles

Related Topics

Articles published on Spark Machine Learning

The investigation of traffic flow prediction and optimization based on Spark

Terrorist Activities Detection via Social Media Using Machine Learning

DETECTION OF CREDIT CARD FRAUD IN REAL TIME USING SPARK ML

Analyzing SQL payloads using logistic regression in a big data environment

Retracted] Storage Method for Medical and Health Big Data Based on Distributed Sensor Network

Performance Evaluation of Data-driven Intelligent Algorithms for Big data Ecosystem.

Distributed big data analysis using spark parallel data processing

Performance analysis of disease diagnostic system using IoMT and real‐time data analytics

Flow-Based Programming for Machine Learning

Assessing naive Bayes and support vector machine performance in sentiment classification on a big data platform

Large scale data analysis using MLlib

An approach to on-stream DDoS blitz detection using machine learning algorithms

Behavior anomaly detection based on big data analysis of Internet of Things

Iowa Liquor Sales Data Predictive Analysis Using Spark

Exploratory and Predictive Analytics of User Preferences from Kaggle LEGO-Toys Datasets Using Spark ML

A Big Data Approach to Black Friday Sales

Predicting the ratings of Amazon products using Big Data

Java Web Cloud Data Analysis and Application Based on Spark Machine Learning Algorithm

Fast texture classification of denoised SAR image patches using GLCM on Spark

A Lockable Abnormal Electromagnetic Signal Joint Detection Algorithm

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Spark Machine Learning Research Articles

Related Topics

Articles published on Spark Machine Learning

The investigation of traffic flow prediction and optimization based on Spark

Terrorist Activities Detection via Social Media Using Machine Learning

DETECTION OF CREDIT CARD FRAUD IN REAL TIME USING SPARK ML

Analyzing SQL payloads using logistic regression in a big data environment

Retracted] Storage Method for Medical and Health Big Data Based on Distributed Sensor Network

Performance Evaluation of Data-driven Intelligent Algorithms for Big data Ecosystem.

Distributed big data analysis using spark parallel data processing

Performance analysis of disease diagnostic system using IoMT and real‐time data analytics

Flow-Based Programming for Machine Learning

Assessing naive Bayes and support vector machine performance in sentiment classification on a big data platform

Large scale data analysis using MLlib

An approach to on-stream DDoS blitz detection using machine learning algorithms

Behavior anomaly detection based on big data analysis of Internet of Things

Iowa Liquor Sales Data Predictive Analysis Using Spark

Exploratory and Predictive Analytics of User Preferences from Kaggle LEGO-Toys Datasets Using Spark ML

A Big Data Approach to Black Friday Sales

Predicting the ratings of Amazon products using Big Data

Java Web Cloud Data Analysis and Application Based on Spark Machine Learning Algorithm

Fast texture classification of denoised SAR image patches using GLCM on Spark

A Lockable Abnormal Electromagnetic Signal Joint Detection Algorithm