A Novel Clustering and Matrix Based Computation for Big Data Dimensionality Reduction and Classification

Jijo Varghese Jijo Varghese,P Tamil Selvan P Tamil Selvan

doi:10.37934/araset.32.1.238251

Abstract

For higher dimensional or "Big Data (BD)" clustering and classification, the dimensions of documents have to be considered. The overhead of classifying methods might also be reduced by resolving the volumetric issue of documents. However, the dimensions of the shortened collection of documents might potentially generate noise and abnormalities. Previous noise and abnormality information removal strategies include several different approaches that have already been established throughout time. To increase classification accuracy, current classifications or new classification methods that has created to conduct classification, must deal with some of the most difficult issues in BD document categorization and clustering. Hence, the goals of this research are derived from the issues that can be solved only by expanding classification accuracy of classifiers. Superior clusters may also be achieved by using effective "Dimensionality Reduction (DR)". As the first step in this research, we introduce a unique DR approach that preserves word frequency in the document collection, allowing the classification algorithm to obtain improved (or) at least equal classification levels of accuracy with a lower dimensionality set of documents. When clustering "Word Patterns (WPs)" during "WP Clustering (WPC)", we imply a new WP "Similarity Function (SF)" for "Similarity Computation (SC)" to be used as part of WPC. DR of the document collection is accomplished with the use of information gained from various WP clusters. Finally, we provide "Similarity Measures" for SC of high dimensional texts and deliver SF for document classification and deliver SF for document classification. With assessment criteria like "Information-Ratio for Dimension-Reduction", "Accuracy", and "Recall", we discovered that the proposed method WP paired with SC (WP-SC) scaled extremely effectively to higher dimensional "Dataset’s (DS)" and surpasses the current technique AFO-MKSVM. According to the findings, the WP-SC approach produced more favorable outcomes than the LDA-SVM and AFO-MKSVM approaches.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Novel Clustering and Matrix Based Computation for Big Data Dimensionality Reduction and Classification

Abstract

Talk to us

Similar Papers

More From: Journal of Advanced Research in Applied Sciences and Engineering Technology

Lead the way for us

Journal: Journal of Advanced Research in Applied Sciences and Engineering Technology	Publication Date: Aug 19, 2023
License type: cc-by-nc

Similar Papers

Advanced Big Data Management and Analytics for Ubiquitous Sensors
Praveen Rao ... Sangjun Lee
International Journal of Distributed Sensor Networks | VOL. 11
Praveen Rao, et. al.Praveen Rao ... Sangjun Lee
01 Jul 2015
International Journal of Distributed Sensor Networks | VOL. 11

Survey: A Comparative Study of Different Security Issues in Big Data
Ravinder Nellutla ... Moulana Mohammed
-
Ravinder Nellutla, et. al.Ravinder Nellutla ... Moulana Mohammed
01 Jan 2020
01 Jan 2020

A survey on security and privacy issues in big data
Duygu Sinanc Terzi ... Ramazan Terzi
-
Duygu Sinanc Terzi, et. al.Duygu Sinanc Terzi ... Ramazan Terzi
01 Dec 2015
01 Dec 2015

Big Data Reduction Methods: A Survey
Muhammad Habib Ur Rehman ... Samee U Khan
Data Science and Engineering | VOL. 1
Muhammad Habib Ur Rehman, et. al.Muhammad Habib Ur Rehman ... Samee U Khan
01 Dec 2016
Data Science and Engineering | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Clustering and Matrix Based Computation for Big Data Dimensionality Reduction and Classification

Abstract

Talk to us

Similar Papers

More From: Journal of Advanced Research in Applied Sciences and Engineering Technology