Abstract

The technology of big data has become highly popular in numerous industries owing to its various characteristics such as high value, large volume, rapid velocity, wide variety, and significant variability. Nevertheless, big data presents several difficulties that must be addressed, including lengthy processing times, high computational complexity, imprecise features, significant sparsity, irrelevant terms, redundancy, and noise, all of which can have an adverse effect on the performance of feature extraction. The objective of this research is to tackle these issues by utilizing the Partial Least Square Generalized Linear Regression (G-PLSGLR) approach to decrease the high dimensionality of text data. The suggested algorithm is made up of four stages: Firstly, gathering featured data in vector space model (VSM) and training it with bootstrap technique. Second, grouping trained feature samples using a Pearson correlation coefficient and graph-based technique. Third, getting rid of unimportant features by ranking significant group features using PLSGR. Lastly, choosing or extracting significant features using Bayesian information criterion (BIC). The G-PLSGLR algorithm surpasses current methods by achieving a high reduction rate and classification performance, while minimizing feature redundancy, time consumption, and complexity. Furthermore, it enhances the accuracy of features by 35%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.