Big Data Analytics are said to help in transforming huge amounts of raw data towards valuable information that can be used, but there are formidable challenges in feature selection and classification due to the complexity and high dimensionality of the data. Traditional methods are usually too weak to handle the built-in uncertainty, imprecision, and inconsistency within big data and they often fail to perform well. This paper aims to induce the new methodology on these problems using the sets of neutrosophic in dealing with more flexible and nuanced data analysis. The key contributions to the current approach proposed are threefold. First, generalization of the classical set through extension of the notions of truth, indeterminacy, and falsity by allowing representations of uncertainty in data. The second combines a powerful process for selecting features based upon neutrosophic set theory that is optimal by genetic algorithms and advances a step further by applying these features in training and validating the classification models across a set of different domains. Therefore, the major aim from this study is to increase accuracy and reliability in feature selection and classification in big data analytics. This methodology has been implemented and tested over datasets of the following types: healthcare, finance, social media, and more. Results have proved great improvement against conventional performance metrics, for example, the classification accuracy with an SVM classifier over the Cleveland Heart Disease dataset increases from 83.5% to 87.2%, and of a Random Forest classifier over a financial dataset from 76.4% to 81.9%. For instance, the accuracy of social media sentiment analysis changed to 82.7% from 78.3%. All these findings establish that the neutrosophic set-based method holds good advantages in addressing the limitations of classical alternatives. The proposed approach of neutrosophism, through an explicit model, enhances performances in classifications and, at the same time, augments overall robustness and reliability in big data analytic. The importance of this study lies in establishing the groundwork for further research and practical applications, thus indicating possible further development in this field.
Read full abstract