Distinguishing Feature Selector Research Articles

Android is a popular open-source operating system highly susceptible to malware attacks. Researchers have developed machine learning models, learned from attributes extracted using static/dynamic approaches to identify malicious applications. However, such models suffer from low detection accuracy, due to the presence of noisy attributes, extracted from conventional feature selection algorithms. Hence, in this paper, a new feature selection mechanism known as selection of relevant attributes for improving locally extracted features using classical feature selectors (SAILS), is proposed. SAILS, targets on discovering prominent system calls from applications, and is built on the top of conventional feature selection methods, such as mutual information, distinguishing feature selector and Galavotti–Sebastiani–Simi. These classical attribute selection methods are used as local feature selectors. Besides, a novel global feature selection method known as, weighted feature selection is proposed. Comprehensive analysis of the proposed feature selectors, is conducted with the traditional methods. SAILS results in improved values for evaluation metrics, compared to the conventional feature selection algorithms for distinct machine learning models, developed using Logistic Regression, CART, Random Forest, XGBoost and Deep Neural Networks. Our evaluations observe accuracies ranging between 95 and 99% for dropout rate and learning rate in the range 0.1–0.8 and 0.001–0.2, respectively. Finally, the security evaluation of malware classifiers on adversarial examples are thoroughly investigated. A decline in accuracy with adversarial examples is observed. Also, SAILS recall rate of classifier subjected to such examples estimate in the range of 24.79–92.2%. However, prior to the attack, the true positive rate obtained by the classifier is reported between 95.2 and 99.79%. The results suggest that the hackers can bypass detection, by discovering the classifier blind spots, on augmenting a small number of legitimate attributes.

Read full abstract

Designing a good feature selection (FS) algorithm is of utmost importance especially for text classification (TC), wherein a large number of features representing terms or words pose serious challenges to the effectiveness and efficiency of classifiers. FS algorithms are divided into two broad categories, namely, feature ranking (FR) and feature subset selection (FSS) algorithms. Unlike FSS, FR algorithms select those features that are individually highly relevant for the class or category without taking the feature interactions into account. This makes FR algorithms simple and computationally more efficient than FSS and thus, mostly a preferred choice for TC. Bi-normal separation (BNS) (Forman, 2003) and information gain (IG) (Yang and Pedersen, 1997) are well-known FR metrics. However, FR algorithms output a set of highly relevant features or terms which can possibly be redundant and can thus, deteriorate a classifier׳s performance. This paper suggests taking the interactions of words into account in order to eliminate redundant terms. Stand-alone FSS algorithms can be computationally expensive for the high-dimensional text data. We therefore suggest a two-stage FS algorithm, which employs an FR metric such as BNS or IG in the first stage and an FSS algorithm such as the Markov blanket filter (MBF) (Koller and Sahami, 1996) in the second stage. Most of the two-stage algorithms proposed in the literature for TC combine feature ranking and feature transformation such as principal component analysis (PCA) algorithms. To estimate the statistical significance of our two-stage algorithm, we carry out experiments on 10 different splits of training and test sets of each of the three (Reuters-21578, TREC, OHSUMED) data sets with naive Bayes׳ and support vector machines. Our results based on a paired two-sided t-test show that the macro F1 performance of BNS+MBF is statistically significant than that of stand-alone BNS in 69% of the total experimental trials. The macro F1 values of IG get enhanced in 72% of the trials when MBF is used in the second stage. We also compare our two-stage algorithm against two recently proposed FS algorithms, namely, distinguishing feature selector (DFS) (Uysal and Gunal, 2012) and a two stage algorithm consisting of IG and PCA algorithms (Uguz, 2011). BNS+MBF is found to be significantly better than DFS and IG+PCA in 74 and 78% of the trials respectively. IG+MBF outperforms DFS and IG+PCA in 93 and 80% of the experimental trials respectively. Similar results are observed for BNS+MBF and IG+MBF when the performances are evaluated in terms of balanced error rate.

Read full abstract

Distinguishing Feature Selector Research Articles

Articles published on Distinguishing Feature Selector

A new metric for feature selection on short text datasets

A novel filter feature selection method for text classification: Extensive Feature Selector

Using modified term frequency to improve term weighting for text classification

Regularized Phrase-Based Topic Model for Automatic Question Classification With Domain-Agnostic Class Labels

Modified DFS-based term weighting scheme for text classification

A Novel Inherent Distinguishing Feature Selector for Highly Skewed Text Document Classification

A novel filter feature selection method using rough set for short text data

SysDroid: a dynamic ML-based android malware analyzer using system call traces

On classification of abstracts obtained from medical journals

Comparative Performance Analysis of Techniques for Automatic Drug Review Classification

Trigonometric comparison measure: A feature selection method for text categorization

COMPARATIVE ANALYSIS OF RECENT FEATURE SELECTION METHODS FOR SENTIMENT CLASSIFICATION

Selection of the most relevant terms based on a max-min ratio metric for text classification

Performance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files

Feature selection based on a normalized difference measure for text classification

Comparing the Speed and Accuracy of Multi-Label Classification Models

A two-stage Markov blanket based feature selection algorithm for text classification

Relative discrimination criterion – A novel feature ranking method for text data

A novel probabilistic feature selection method for text classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Distinguishing Feature Selector Research Articles

Articles published on Distinguishing Feature Selector

A new metric for feature selection on short text datasets

A novel filter feature selection method for text classification: Extensive Feature Selector

Using modified term frequency to improve term weighting for text classification

Regularized Phrase-Based Topic Model for Automatic Question Classification With Domain-Agnostic Class Labels

Modified DFS-based term weighting scheme for text classification

A Novel Inherent Distinguishing Feature Selector for Highly Skewed Text Document Classification

A novel filter feature selection method using rough set for short text data

SysDroid: a dynamic ML-based android malware analyzer using system call traces

On classification of abstracts obtained from medical journals

Comparative Performance Analysis of Techniques for Automatic Drug Review Classification

Trigonometric comparison measure: A feature selection method for text categorization

COMPARATIVE ANALYSIS OF RECENT FEATURE SELECTION METHODS FOR SENTIMENT CLASSIFICATION

Selection of the most relevant terms based on a max-min ratio metric for text classification

Performance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files

Feature selection based on a normalized difference measure for text classification

Comparing the Speed and Accuracy of Multi-Label Classification Models

A two-stage Markov blanket based feature selection algorithm for text classification

Relative discrimination criterion – A novel feature ranking method for text data

A novel probabilistic feature selection method for text classification