Supervised Classification Problem Research Articles

BackgroundIn an era of “big data,” computationally efficient and privacy-aware solutions for large-scale machine learning problems become crucial, especially in the healthcare domain, where large amounts of data are stored in different locations and owned by different entities. Past research has been focused on centralized algorithms, which assume the existence of a central data repository (database) which stores and can process the data from all participants. Such an architecture, however, can be impractical when data are not centrally located, it does not scale well to very large datasets, and introduces single-point of failure risks which could compromise the integrity and privacy of the data. Given scores of data widely spread across hospitals/individuals, a decentralized computationally scalable methodology is very much in need. ObjectiveWe aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We seek to develop a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data. MethodsWe focus on the soft-margin l1-regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the data holders to collaborate, while keeping every participant's data private. ResultsWe test cPDS on the problem of predicting hospitalizations due to heart diseases within a calendar year based on information in the patients Electronic Health Records prior to that year. cPDS converges faster than centralized methods at the cost of some communication between agents. It also converges faster and with less communication overhead compared to an alternative distributed algorithm. In both cases, it achieves similar prediction accuracy measured by the Area Under the Receiver Operating Characteristic Curve (AUC) of the classifier. We extract important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts.

Read full abstract

ABSTRACTIn many remote-sensing projects, one is usually interested in a small number of land-cover classes present in a study area and not in all the land-cover classes that make-up the landscape. Previous studies in supervised classification of satellite images have tackled specific class mapping problem by isolating the classes of interest and combining all other classes into one large class, usually called others, and by developing a binary classifier to discriminate the class of interest from the others. Here, this approach is called focused approach. The strength of the focused approach is to decompose the original multi-class supervised classification problem into a binary classification problem, focusing the process on the discrimination of the class of interest. Previous studies have shown that this method is able to discriminate more accurately the classes of interest when compared with the standard multi-class supervised approach. However, it may be susceptible to data imbalance problems present in the training data set, since the classes of interest are often a small part of the training set. A result the classification may be biased towards the largest classes and, thus, be sub-optimal for the discrimination of the classes of interest. This study presents a way to minimize the effects of data imbalance problems in specific class mapping using cost-sensitive learning. In this approach errors committed in the minority class are treated as being costlier than errors committed in the majority class. Cost-sensitive approaches are typically implemented by weighting training data points accordingly to their importance to the analysis. By changing the weight of individual data points, it is possible to shift the weight from the larger classes to the smaller ones, balancing the data set. To illustrate the use of the cost-sensitive approach to map specific classes of interest, a series of experiments with weighted support vector machines classifier and Landsat Thematic Mapper data were conducted to discriminate two types of mangrove forest (high-mangrove and low-mangrove) in Saloum estuary, Senegal, a United Nations Educational, Scientific and Cultural Organisation World Heritage site. Results suggest an increase in overall classification accuracy with the use of cost-sensitive method (97.3%) over the standard multi-class (94.3%) and the focused approach (91.0%). In particular, cost-sensitive method yielded higher sensitivity and specificity values on the discrimination of the classes of interest when compared with the standard multi-class and focused approaches.

Read full abstract

Supervised Classification Problem Research Articles

Articles published on Supervised Classification Problem

Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms

A novel semi supervised approach for text classification

Extended Box Clustering for Classification Problems

Federated learning of predictive models from federated Electronic Health Records

Multiple kernel learning using single stage function approximation for binary classification problems

A Comparative Analysis of a Novel Anomaly Detection Algorithm with Neural Networks

Brillouin Optical Time-Domain Analyzer Assisted by Support Vector Machine for Ultrafast Temperature Extraction

A comparison of fitness-case sampling methods for genetic programming

Graph Regularized Restricted Boltzmann Machine.

Robust pattern decoding in shape-coded structured light

Optimization of Gene Set Annotations Using Robust Trace-Norm Multitask Learning.

Improving specific class mapping from remotely sensed data by cost-sensitive learning

Nonparametric regression on contaminated functional predictor with application to hyperspectral data

A new nearest neighbor classification method based on fuzzy set theory and aggregation operators

Optimization approaches to Supervised Classification

Dynamic statistical classification

Target curricula via selection of minimum feature sets

A new topological entropy-based approach for measuring similarities among piecewise linear functions

Vine copula classifiers for the mind reading problem

Machine learning models, epistemic set-valued data and generalized loss functions: An encompassing approach

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Supervised Classification Problem Research Articles

Articles published on Supervised Classification Problem

Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms

A novel semi supervised approach for text classification

Extended Box Clustering for Classification Problems

Federated learning of predictive models from federated Electronic Health Records

Multiple kernel learning using single stage function approximation for binary classification problems

A Comparative Analysis of a Novel Anomaly Detection Algorithm with Neural Networks

Brillouin Optical Time-Domain Analyzer Assisted by Support Vector Machine for Ultrafast Temperature Extraction

A comparison of fitness-case sampling methods for genetic programming

Graph Regularized Restricted Boltzmann Machine.

Robust pattern decoding in shape-coded structured light

Optimization of Gene Set Annotations Using Robust Trace-Norm Multitask Learning.

Improving specific class mapping from remotely sensed data by cost-sensitive learning

Nonparametric regression on contaminated functional predictor with application to hyperspectral data

A new nearest neighbor classification method based on fuzzy set theory and aggregation operators

Optimization approaches to Supervised Classification

Dynamic statistical classification

Target curricula via selection of minimum feature sets

A new topological entropy-based approach for measuring similarities among piecewise linear functions

Vine copula classifiers for the mind reading problem

Machine learning models, epistemic set-valued data and generalized loss functions: An encompassing approach