Hybridized Dimensionality Reduction Method for Machine Learning based Web Pages Classification

doi:10.33103/uot.ijccce.22.3.9

Abstract

Feature space high dimensionality is a well-known problem in text classification and web mining domains, it is caused mainly by the large number of vocabularies contained within web documents. Several methods were applied to select the most useful and important features over the years; however, the performance of such methods is still improvable from different aspects such as the computational cost and accuracy. This research presents an enhanced cosine similarity-based hybridization of two efficient feature selection methods for higher classification performance. The reduced feature sets are generated using the Random Projection (RP) and the Principal Component Analysis (PCA) methods, individually, then hybridized based on the cosine similarity values between features’ vectors. The performance of the proposed method in terms of accuracy and F-measure was tested on a dataset of web pages based on several term weighting schemes. As compared to relevant methods, results of the proposed method show significantly higher accuracy and f-measure performance based on less feature set size. Index Terms— Cosine similarity, Dimensionality Reduction, Feature selection, PCA, Random Projection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybridized Dimensionality Reduction Method for Machine Learning based Web Pages Classification

Abstract

Talk to us

Similar Papers

More From: Iraqi Journal of Computer, Communication, Control and System Engineering

Lead the way for us

Similar Papers

A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance
E Sujatha Sujatha ... R Radha Radha
Indian Journal of Science and Technology | VOL. 14
E Sujatha Sujatha, et. al.E Sujatha Sujatha ... R Radha Radha
25 Jun 2021
Indian Journal of Science and Technology | VOL. 14

A two-step feature selection method for quranic text classification
A Adeleke ... N A Samsudin
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 16
A Adeleke, et. al.A Adeleke ... N A Samsudin
01 Nov 2019
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 16

How to reduce dimension with PCA and random projections?
Fan Yang ... David P Woodruff
IEEE Transactions on Information Theory | VOL. 67
Fan Yang, et. al.Fan Yang ... David P Woodruff
01 Dec 2021
IEEE Transactions on Information Theory | VOL. 67

Author response: Sparse dimensionality reduction approaches in Mendelian randomisation with highly correlated exposures
Vasileios Karageorgiou ... Verena Zuber
-
Vasileios Karageorgiou, et. al.Vasileios Karageorgiou ... Verena Zuber
28 Nov 2022
28 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybridized Dimensionality Reduction Method for Machine Learning based Web Pages Classification

Abstract

Talk to us

Similar Papers

More From: Iraqi Journal of Computer, Communication, Control and System Engineering