Greedy Stepwise Search Research Articles

PurposeEmail is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates a necessity to build a reliable and robust spam classifier. This paper aims to presents a study of evolutionary classifiers (genetic algorithm [GA] and genetic programming [GP]) without/with the help of an ensemble of classifiers method. In this research, the classifiers ensemble has been developed with adaptive boosting technique.Design/methodology/approachText mining methods are applied for classifying spam emails and legitimate emails. Two data sets (Enron and SpamAssassin) are taken to test the concerned classifiers. Initially, pre-processing is performed to extract the features/words from email files. Informative feature subset is selected from greedy stepwise feature subset search method. With the help of informative features, a comparative study is performed initially within the evolutionary classifiers and then with other popular machine learning classifiers (Bayesian, naive Bayes and support vector machine).FindingsThis study reveals the fact that evolutionary algorithms are promising in classification and prediction applications where genetic programing with adaptive boosting is turned out not only an accurate classifier but also a sensitive classifier. Results show that initially GA performs better than GP but after an ensemble of classifiers (a large number of iterations), GP overshoots GA with significantly higher accuracy. Amongst all classifiers, boosted GP turns out to be not only good regarding classification accuracy but also low false positive (FP) rates, which is considered to be the important criteria in email spam classification. Also, greedy stepwise feature search is found to be an effective method for feature selection in this application domain.Research limitations/implicationsThe research implication of this research consists of the reduction in cost incurred because of spam/unsolicited bulk email. Email is a fundamental necessity to share information within a number of units of the organizations to be competitive with the business rivals. In addition, it is continually a hurdle for internet service providers to provide the best emailing services to their customers. Although, the organizations and the internet service providers are continuously adopting novel spam filtering approaches to reduce the number of unwanted emails, the desired effect could not be significantly seen because of the cost of installation, customizable ability and the threat of misclassification of important emails. This research deals with all the issues and challenges faced by internet service providers and organizations.Practical implicationsIn this research, the proposed models have not only provided excellent performance accuracy, sensitivity with low FP rate, customizable capability but also worked on reducing the cost of spam. The same models may be used for other applications of text mining also such as sentiment analysis, blog mining, news mining or other text mining research.Originality/valueA comparison between GP and GAs has been shown with/without ensemble in spam classification application domain.

Read full abstract

Purpose To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews. Design/methodology/approach An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest. Findings The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48. Research limitations/implications Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario. Practical implications In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers. Social implications The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications. Originality/value The constructed PCC is novel and was tested on Indian movie review data.

Read full abstract

Greedy Stepwise Search Research Articles

Articles published on Greedy Stepwise Search

Multi-view feature fusion and density-based minority over-sampling technique for amyloid protein prediction under imbalanced data

Modelling Occurrence of Invasive Water Hyacinth (Eichhornia crassipes) in Wetlands

Performance Evaluation of a Proposed Machine Learning Model for Chronic Disease Datasets Using an Integrated Attribute Evaluator and an Improved Decision Tree Classifier

Cost-sensitive metaheuristic technique for credit card fraud detection

Simple Logistic Hybrid System Based on Greedy Stepwise Algorithm for Feature Analysis to Diagnose Parkinson’s Disease According to Gender

A Proposed Ensemble Model with Feature Selection Technique for Classification of Chronic Kidney Disease

A study of boosted evolutionary classifiers for detecting spam

Malicious web pages detection using feature selection techniques and machine learning

Prediction of sheep carcass traits from early-life records using machine learning

Analysing user sentiment of Indian movie reviews

A modified content-based evolutionary approach to identify unsolicited emails

Input variable selection with greedy stepwise search algorithm for analysing the probability of fish occurrence: A case study for Alburnoides mossulensis in the Gamasiab River, Iran

The Impact of Feature Selection on Urban Land Cover Classification

Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

A Meta-Model Implementation with Tabu Search Technique to Determine the Buying Pattern of Online Customers

Nature Inspired Feature Selection Approach for Effective Intrusion Detection

Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails

Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data

Breast Cancer Prediction System using Feature Selection and Data Mining Methods

Comparison of modelling techniques to predict macroinvertebrate community composition in rivers of Ethiopia

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Greedy Stepwise Search Research Articles

Articles published on Greedy Stepwise Search

Multi-view feature fusion and density-based minority over-sampling technique for amyloid protein prediction under imbalanced data

Modelling Occurrence of Invasive Water Hyacinth (Eichhornia crassipes) in Wetlands

Performance Evaluation of a Proposed Machine Learning Model for Chronic Disease Datasets Using an Integrated Attribute Evaluator and an Improved Decision Tree Classifier

Cost-sensitive metaheuristic technique for credit card fraud detection

Simple Logistic Hybrid System Based on Greedy Stepwise Algorithm for Feature Analysis to Diagnose Parkinson’s Disease According to Gender

A Proposed Ensemble Model with Feature Selection Technique for Classification of Chronic Kidney Disease

A study of boosted evolutionary classifiers for detecting spam

Malicious web pages detection using feature selection techniques and machine learning

Prediction of sheep carcass traits from early-life records using machine learning

Analysing user sentiment of Indian movie reviews

A modified content-based evolutionary approach to identify unsolicited emails

Input variable selection with greedy stepwise search algorithm for analysing the probability of fish occurrence: A case study for Alburnoides mossulensis in the Gamasiab River, Iran

The Impact of Feature Selection on Urban Land Cover Classification

Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

A Meta-Model Implementation with Tabu Search Technique to Determine the Buying Pattern of Online Customers

Nature Inspired Feature Selection Approach for Effective Intrusion Detection

Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails

Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data

Breast Cancer Prediction System using Feature Selection and Data Mining Methods

Comparison of modelling techniques to predict macroinvertebrate community composition in rivers of Ethiopia