Prototype Selection Algorithms Research Articles

Follow Topic

Overview

28 Articles

Published in last 50 years

Articles published on Prototype Selection Algorithms

28 Search results

Adaptive prototype selection algorithm for fuzzy monotonic K-nearest neighbor

Journal of Intelligent & Fuzzy Systems

Apr 17, 2024
Jiankai Chen + 3

Read Paper

Prototype Selection for Multilabel Instance-Based Learning

Reducing the size of the training set, which involves replacing it with a condensed set, is a widely adopted practice to enhance the efficiency of instance-based classifiers while trying to maintain high classification accuracy. This objective can be achieved through the use of data reduction techniques, also known as prototype selection or generation algorithms. Although there are numerous algorithms available in the literature that effectively address single-label classification problems, most of them are not applicable to multilabel data, where an instance can belong to multiple classes. Well-known transformation methods cannot be combined with a data reduction technique due to different reasons. The Condensed Nearest Neighbor rule is a popular parameter-free single-label prototype selection algorithm. The IB2 algorithm is the one-pass variation of the Condensed Nearest Neighbor rule. This paper proposes variations of these algorithms for multilabel data. Through an experimental study conducted on nine distinct datasets as well as statistical tests, we demonstrate that the eight proposed approaches (four for each algorithm) offer significant reduction rates without compromising the classification accuracy.

Open Access

Information

Oct 19, 2023
Panagiotis Filippakis + 2

Read Paper

Data reduction via multi-label prototype generation

A very common practice to speed up instance based classifiers is to reduce the size of their training set, that is, replace it by a condensing set, hoping that their accuracy will not worsen. This can be achieved by applying a Prototype Selection or Generation algorithm, also referred to as a Data Reduction Technique. Most of these techniques cannot be applied on multi-label problems, where an instance may belong to more than one classes. Reduction through Homogeneous Clustering (RHC) and Reduction by Space Partitioning (RSP3) are parameter-free single-label Prototype Generation algorithms. Both are based on recursive data partitioning procedures that identify homogeneous clusters of training data, which they replace by their representatives. This paper proposes variations of these algorithms for multi-label training datasets. The proposed methods generate multi-label prototypes and inherit all the desirable properties of their single-label versions. They consider clusters that contain instances that share at least one common label as homogeneous clusters. It is shown via an experimental study based on nine multi-label datasets that the proposed algorithms achieve good reduction rates without negatively affecting classification accuracy.

Neurocomputing

Jan 12, 2023
Stefanos Ougiaroglou + 3

Read Paper

Fast prototype selection algorithm based on adjacent neighbourhood and boundary approximation

The unceasing increase of data quantity severely limits the wide application of mature classification algorithms due to the unacceptable execution time and the insufficient memory. How to fast incrementally obtain high decision reference set and adapt to incremental data environment is urgently needed in incremental environments, large dataset, etc. This paper proposes a novel prototype selection algorithm by integrating the strategies between condensing method and editing method. To an unlearned pattern, this algorithm extends the references scope from its single nearest neighbour to its k nearest neighbourhood that can expand the judgment information to obtain its detailed neighbour relationship. Then a pattern was determined whether it is a prototype using its neighbour relationship and classification boundary asymptotically strategy. To maintain the higher reference set, this algorithm periodically updates those prototypes that locates in the non-boundary zone or is long-time unlearned. The empirical study shows that this algorithm obtains the smaller and higher boundary prototypes without decreasing classification accuracy and reduction rate than the compared algorithms.

Open Access

Scientific Reports

Nov 22, 2022
Juan Li + 1

Read Paper

A hybrid prototype selection-based deep learning approach for anomaly detection in industrial machines

Anomaly detection in time series is an important task to many applications, e.g, the maintenance policies of rotating machines within industries strongly rely on time series monitoring. Rotating machines are vital elements within industries. Therefore, maintenance policies on these critical elements concern the quality of products and safety issues. Condition-based maintenance is an example of those policies. In this context, we propose a novel method to train a deep learning-based feature extractor for the anomaly detection problem on rotating machinery. It consists of using a prototype selection algorithm to improve the training process of a randomly initialized feature extractor. We perform this process iteratively using data belonging to one probability distribution, i.e., the normal class. We carried the prototype selection out with the Nearest Neighbors algorithm, and the feature extractor was a Convolutional Neural Network. We validate the method on three datasets of spectrograms related to gearbox and compressors faults and achieved promising results. We obtained detection rates in anomalous data close to 100%, and the anomaly detectors classified normal instances with accuracy values superior to 95%. Those results were competitive concerning other deep learning-based anomaly detectors in the literature, with the advantage of being an integrated solution.

Expert systems with applications

May 14, 2022
Rodrigo De Paula Monteiro + 4

Read Paper

Histogram Entropy Representation and Prototype Based Machine Learning Approach for Malware Family Classification

The number of malware has steadily increased as malware spread and evasion techniques have advanced. Machine learning has contributed to making malware analysis more efficient by detecting various behavioral and evasion patterns. However, when analyzing large-scale malware datasets, malware analysis through learning models has both high temporal and spatial complexity. In order to address these problems, this work proposes a low-dimensional feature using histogram entropy and a prototype selection algorithm using hyperrectangles. The low-dimensional feature forms an $L \times 256$ map according to the preselected parameter $L$ . The prototype selection algorithm divides the input space into overlapping subspaces where each subspace is decided by its hyperrectangle that becomes a prototype in the same class. A set cover optimization algorithm is employed to select a small number of prototypes that construct a new training dataset. A set of prototypes selected by the prototype selection algorithm is used to classify malware families. The experiment compares the performance of machine learning models for the histogram entropy feature using both the BIG 2015 dataset and the collected dataset. The integrated approach is evaluated using learning algorithms, such as Decision Tree, Random Forest, XGBoost, and CNN. The experimental results indicate that learning models perform competitively when compared to the entire dataset, while the proposed selection approach benefits from smaller datasets and lower time complexity.

Open Access

IEEE access : practical innovations, open solutions

Jan 1, 2021
Byunghyun Baek + 4

Read Paper

Web Objectionable Video Recognition Based on Deep Multi-Instance Learning With Representative Prototypes Selection

To protect underage people from accessing objectionable videos in the Internet, an effective objectionable video recognition algorithm is necessary for web filtering. Recently, the multi-instance learning has been introduced for objectionable video recognition and achieves impressive results. However, hand-crafted features as well as redundant and noisy frames in objectionable videos become an intractable problem that inevitably degrades the recognition performance. In this paper, we propose a novel representative prototype selection algorithm embedding deep multi-instance representation learning. In the proposed method, an improved convolutional neural network is designed for multimodal multi-instance feature learning and a self-expressive dictionary learning model based on sparse and low rank constraint is designed to select the representative prototypes from each subspace of instances. Then the bag-level feature is constructed via mapping the bag to the selected prototypes. Experiments on three objectionable video sets show the effectiveness of our method for objectionable video recognition.

IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

May 5, 2020
Xinmiao Ding + 6

Read Paper

Instance-based classification using prototypes generated from large noisy and streaming datasets

Nowadays, large volumes of training data are available from various data sources and streaming environments. Instance-based classifiers perform adequately when they use only a small subset of such datasets. Larger data volumes introduce high computational cost that prohibits the timely execution of the classification process. Conventional prototype selection and generation algorithms are also inappropriate for data streams and large datasets. In the past, we proposed prototype generation algorithms that maintain a dynamic set of prototypes and are appropriate for such types of data. Dynamic because existing prototypes may be updated, or new prototypes may be appended to the set of prototypes in the course of processing. Still, repetitive generation of new prototypes may result to forming unpredictably large sets of prototypes. In this paper, we propose a new variation of our algorithm that maintains the prototypes in a convenient and manageable way. This is achieved by removing the weakest prototype when a new prototype is generated. The new algorithm has been tested on several datasets. The experimental results reveal that it is as accurate as its predecessor, yet it is more efficient and noise tolerant.

Open Access

Computer Science and Information Systems

Dec 26, 2019
Stefanos Ougiaroglou + 2

Read Paper

Extensions to rank-based prototype selection in k-Nearest Neighbour classification

The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: (i) a greater robustness against noise at label level by considering the parameter ‘k’ of the classification in the selection process; and (ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is empirically proved that the new full approach is competitive with respect to existing PS algorithms.

Open Access

Applied Soft Computing Journal

Sep 26, 2019
Juan Ramón Rico-Juan + 2

Read Paper

EEG-Based Emotion Recognition with Prototype-Based Data Representation.

Emotions play an important role in human communication, and EEG signals are widely used for emotion recognition. Despite the extensive research of EEG in recent year, it is still challenging to interpret EEG signals effectively due to the massive noises in EEG signals. In this paper, we propose an effective emotion recognition framework, which contains two main parts: the representation network and the prototype selection algorithm. Through our proposed representation network, samples from the same kind of emotion state are more close to each other in high-level representation, and then, we selected the prototypes from the clustering set in feature space match the following testing samples. This method takes advantage of the powerful representation ability of deep learning and learns a better describable feature space rather than learn the classifier explicitly. The experiments on SEED dataset achieves a high accuracy of 93.29% and outperforms a set of baseline methods and the recent deep learning emotion classification approaches. These experimental results demonstrate the effectiveness of our proposed emotion recognition framework.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

Jul 1, 2019
Yixin Wang + 6

Read Paper

Comparison of Prototype Selection Algorithms Used in Construction of Neural Networks Learned by SVD

Abstract Radial basis function networks (RBFNs) or extreme learning machines (ELMs) can be seen as linear combinations of kernel functions (hidden neurons). Kernels can be constructed in random processes like in ELMs, or the positions of kernels can be initialized by a random subset of training vectors, or kernels can be constructed in a (sub-)learning process (sometimes by k-means, for example). We found that kernels constructed using prototype selection algorithms provide very accurate and stable solutions. What is more, prototype selection algorithms automatically choose not only the placement of prototypes, but also their number. Thanks to this advantage, it is no longer necessary to estimate the number of kernels with time-consuming multiple train-test procedures. The best results of learning can be obtained by pseudo-inverse learning with a singular value decomposition (SVD) algorithm. The article presents a comparison of several prototype selection algorithms co-working with singular value decomposition-based learning. The presented comparison clearly shows that the combination of prototype selection and SVD learning of a neural network is significantly better than a random selection of kernels for the RBFN or the ELM, the support vector machine or the kNN. Moreover, the presented learning scheme requires no parameters except for the width of the Gaussian kernel.

Open Access

International Journal of Applied Mathematics and Computer Science

Dec 1, 2018
Norbert Jankowski

Read Paper

Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning

In this paper, the focus is on the application of prototype selection to multi-label data sets as a preliminary stage in the learning process. There are two general strategies when designing Machine Learning algorithms that are capable of dealing with multi-label problems: data transformation and method adaptation. These strategies have been successfully applied in obtaining classifiers and regressors for multi-label learning. Here we investigate the feasibility of data transformation in obtaining prototype selection algorithms for multi-label data sets from three prototype selection algorithms for single-label. The data transformation methods used were: binary relevance, dependent binary relevance, label powerset, and random k-labelsets. The general conclusion is that the methods of prototype selection obtained using data transformation are not better than those obtained through method adaptation. Moreover, prototype selection algorithms designed for multi-label do not do an entirely satisfactory job, because, although they reduce the size of the data set, without affecting significantly the accuracy, the classifier trained with the reduced data set does not improve the accuracy of the classifier when it is trained with the whole data set.

Expert systems with applications

May 26, 2018
Álvar Arnaiz-González + 3

Read Paper

Prototype-based classification and error analysis under bootstrapping strategy

A prototype-based classification is proposed to select handfuls of class data for learning rules and prediction. A class point is considered as a prototype if it forms a hypersphere that represents a part of class area measured by any distance metric and class label. The prototype selection algorithm, formulated by a set covering optimisation, selects the number of within-class points that is as small as possible, while preserving class covering regions for the unknown data distribution. The upper bound of the error is analysed to compare the effectiveness of the prototype-based classification with the Bayes classifier. Under a bootstrapping strategy and the 0/1 loss, the bias and variance components are driven from a generalisation error without assuming the unknown distribution of a given problem. This analysis provides a way to evaluate prototype-based models and select the optimal model estimate for any standard classifier. The experiments show that the proposed approach is very competitive when compared to the nearest neighbour and the Bayes classifier and efficient in choosing prototypes in terms of class covering regions, data size and computation time.

International Journal of Data Mining, Modelling and Management

Jan 1, 2018
Doosung Hwang + 1

Read Paper

Prototype-based classification and error analysis under bootstrapping strategy

International Journal of Data Mining, Modelling and Management

Jan 1, 2018
Youngju Son + 1

Read Paper

A clustering-based hybrid approach for dual data reduction

The research on data reduction techniques has become important to enhance the efficacy and efficiency of data mining algorithms which may otherwise be compromised in the presence of a large number of irrelevant attributes and redundant instances. Data can be reduced by selecting either a subset of attributes or instances. Dual selection treats the problem of feature and instance selection together as a single optimisation problem. The problem of dual selection is relatively difficult as it involves an enormously large search space. In this paper, we propose a hybrid instance feature selection; HIFS-CHC method using heterogeneous recombination and cataclysmic mutation; CHC adaptive search genetic algorithm to solve the problem of dual selection. The proposed approach works in two stages. In the first stage, K-means clustering algorithm is used to reduce the search space. The second stage incorporates stratified prototype selection and CHC algorithm for data reduction. The clustering based hybrid scheme is experimentally tested on sixteen benchmark datasets and compared with the other similar data reduction algorithms with respect to the predictive accuracy, reduction rate and execution time. Experimental results show that the proposed method outperforms the other methods in terms of reduction rate and execution time while preserving the predictive accuracy almost at the same level.

International Journal of Intelligent Engineering Informatics

Jan 1, 2018
Saroj Ratnoo + 2

Read Paper

Prototype selection to improve monotonic nearest neighbor

Student surveys occupy a central place in the evaluation of courses at teaching institutions. At the end of each course, students are requested to evaluate various aspects such as activities, methodology, coordination or resources used. In addition, a final qualification is given to summarize the quality of the course. The prediction of this final qualification can be accomplished by using monotonic classification techniques. The outcome offered by these surveys is particularly significant for faculty and teaching staff associated with the course.The monotonic nearest neighbor classifier is one of the most relevant algorithms in monotonic classification. However, it does suffer from two drawbacks, (a) inefficient execution time in classification and (b) sensitivity to no monotonic examples. Prototype selection is a data reduction process for classification based on nearest neighbor that can be used to alleviate these problems. This paper proposes a prototype selection algorithm called Monotonic Iterative Prototype Selection (MONIPS) algorithm. Our objective is two-fold. The first one is to introduce MONIPS as a method for obtaining monotonic solutions. MONIPS has proved to be competitive with classical prototype selection solutions adapted to monotonic domain. Besides, to further demonstrate the good performance of MONIPS in the context of a student survey about taught courses.

Engineering Applications of Artificial Intelligence

Feb 17, 2017
José-Ramón Cano + 4

Read Paper

Selection of effective training instances for scalable automatic image annotation

Automatic image annotation means employing learning models for describing visual contents of images by using text descriptors. With the fast growth of digital images in the web, large-scale automatic image annotation has started to deal with major challenges. The most important challenges are scalability and annotation performance. In this research, in order to solve scalability and the image annotation time challenge, the prototype selection approach is used. The assumption of the prototype selection is based on single-label instances while, in image annotation, an instance has more than one label. It means that instances are multi-label. Hence, to employ prototype selection algorithms in image annotation, focusing on the concept of multi-label is a critical task. Thus, taking an appropriate measure in these methods to compute the rate of dissimilarity between label vectors has a great importance. The proposed approach in this paper is based on multi-labeling of prototype selection methods by selecting a modifying appropriate binary dissimilarity measure, in comparison two label vectors. The effectiveness of the proposed approach in reducing the number of training instances and selecting effective ones has been shown by experiments on large-scale NUS-WIDE family image sets. The experimental results showed the effectiveness of the proposed approach in reducing the number of instances and improving annotation performance.

Multimedia Tools and Applications

May 16, 2016
Hamid Kargar Shooroki + 1

Read Paper

On the suitability of Prototype Selection methods for kNN classification with distributed data

In the current Information Age, data production and processing demands are ever increasing. This has motivated the appearance of large-scale distributed information. This phenomenon also applies to Pattern Recognition so that classic and common algorithms, such as the k-Nearest Neighbour, are unable to be used. To improve the efficiency of this classifier, Prototype Selection (PS) strategies can be used. Nevertheless, current PS algorithms were not designed to deal with distributed data, and their performance is therefore unknown under these conditions. This work is devoted to carrying out an experimental study on a simulated framework in which PS strategies can be compared under classical conditions as well as those expected in distributed scenarios. Our results report a general behaviour that is degraded as conditions approach to more realistic scenarios. However, our experiments also show that some methods are able to achieve a fairly similar performance to that of the non-distributed scenario. Thus, although there is a clear need for developing specific PS methodologies and algorithms for tackling these situations, those that reported a higher robustness against such conditions may be good candidates from which to start.

Open Access

Neurocomputing

May 10, 2016
Jose J Valero-Mas + 2

Read Paper

Improving nearest neighbor classification using Ensembles of Evolutionary Generated Prototype Subsets

One of the most accurate types of prototype selection algorithms, preprocessing techniques that select a subset of instances from the data before applying nearest neighbor classification to it, are evolutionary approaches. These algorithms result in very high accuracy and reduction rates, but unfortunately come at a substantial computational cost. In this paper, we introduce a framework that allows to efficiently use the intermediary results of the prototype selection algorithms to further increase their accuracy performance. Instead of only using the fittest prototype subset generated by the evolutionary algorithm, we use multiple prototype subsets in an ensemble setting. Secondly, in order to classify a test instance, we only use prototype subsets that accurately classify training instances in the neighborhood of that test instance. In an experimental evaluation, we apply our new framework to four state-of-the-art prototype selection algorithms and show that, by using our framework, more accurate results are obtained after less evaluations of the prototype selection method. We also present a case study with a prototype generation algorithm, showing that our framework is easily extended to other preprocessing paradigms as well.

Open Access

Applied Soft Computing Journal

Apr 6, 2016
Nele Verbiest + 4

Read Paper

Data-Distribution-Aware Fuzzy Rough Set Model and its Application to Robust Classification.

Fuzzy rough sets (FRSs) are considered to be a powerful model for analyzing uncertainty in data. This model encapsulates two types of uncertainty: 1) fuzziness coming from the vagueness in human concept formation and 2) roughness rooted in the granulation coming with human cognition. The rough set theory has been widely applied to feature selection, attribute reduction, and classification. However, it is reported that the classical FRS model is sensitive to noisy information. To address this problem, several robust models have been developed in recent years. Nevertheless, these models do not consider a statistical distribution of data, which is an important type of uncertainty. Data distribution serves as crucial information for designing an optimal classification or regression model. Thus, we propose a data-distribution-aware FRS model that considers distribution information and incorporates it in computing lower and upper fuzzy approximations. The proposed model considers not only the similarity between samples, but also the probability density of classes. In order to demonstrate the effectiveness of the proposed model, we design a new sample evaluation index for prototype-based classification based on the model, and a prototype selection algorithm is developed using this index. Furthermore, a robust classification algorithm is constructed with prototype covering and nearest neighbor classification. Experimental results confirm the robustness and effectiveness of the proposed model.

IEEE transactions on cybernetics

Jan 1, 2015
Shuang An + 4

Read Paper

Prototype Selection Algorithms Research Articles

Related Topics

Articles published on Prototype Selection Algorithms

Adaptive prototype selection algorithm for fuzzy monotonic K-nearest neighbor

Prototype Selection for Multilabel Instance-Based Learning

Data reduction via multi-label prototype generation

Fast prototype selection algorithm based on adjacent neighbourhood and boundary approximation

A hybrid prototype selection-based deep learning approach for anomaly detection in industrial machines

Histogram Entropy Representation and Prototype Based Machine Learning Approach for Malware Family Classification

Web Objectionable Video Recognition Based on Deep Multi-Instance Learning With Representative Prototypes Selection

Instance-based classification using prototypes generated from large noisy and streaming datasets

Extensions to rank-based prototype selection in k-Nearest Neighbour classification

EEG-Based Emotion Recognition with Prototype-Based Data Representation.

Comparison of Prototype Selection Algorithms Used in Construction of Neural Networks Learned by SVD

Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning

Prototype-based classification and error analysis under bootstrapping strategy

Prototype-based classification and error analysis under bootstrapping strategy

A clustering-based hybrid approach for dual data reduction

Prototype selection to improve monotonic nearest neighbor

Selection of effective training instances for scalable automatic image annotation

On the suitability of Prototype Selection methods for kNN classification with distributed data

Improving nearest neighbor classification using Ensembles of Evolutionary Generated Prototype Subsets

Data-Distribution-Aware Fuzzy Rough Set Model and its Application to Robust Classification.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Prototype Selection Algorithms Research Articles

Related Topics

Articles published on Prototype Selection Algorithms

Adaptive prototype selection algorithm for fuzzy monotonic K-nearest neighbor

Prototype Selection for Multilabel Instance-Based Learning

Data reduction via multi-label prototype generation

Fast prototype selection algorithm based on adjacent neighbourhood and boundary approximation

A hybrid prototype selection-based deep learning approach for anomaly detection in industrial machines

Histogram Entropy Representation and Prototype Based Machine Learning Approach for Malware Family Classification

Web Objectionable Video Recognition Based on Deep Multi-Instance Learning With Representative Prototypes Selection

Instance-based classification using prototypes generated from large noisy and streaming datasets

Extensions to rank-based prototype selection in k-Nearest Neighbour classification

EEG-Based Emotion Recognition with Prototype-Based Data Representation.

Comparison of Prototype Selection Algorithms Used in Construction of Neural Networks Learned by SVD

Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning

Prototype-based classification and error analysis under bootstrapping strategy

Prototype-based classification and error analysis under bootstrapping strategy

A clustering-based hybrid approach for dual data reduction

Prototype selection to improve monotonic nearest neighbor

Selection of effective training instances for scalable automatic image annotation

On the suitability of Prototype Selection methods for kNN classification with distributed data

Improving nearest neighbor classification using Ensembles of Evolutionary Generated Prototype Subsets

Data-Distribution-Aware Fuzzy Rough Set Model and its Application to Robust Classification.