Abstract

Many real-world data are labeled with natural orders, i.e., ordinal labels. Examples can be found in a wide variety of fields. Ordinal regression is a problem to predict ordinal labels for given patterns. There are specially developed ordinal regression methods to tackle this type of problems, but they are usually centralized. However, in some scenarios, data are collected distributedly by nodes of a network. For the purpose of privacy protection or due to some practical constraints, it is difficult or impossible to transmit the data to a fusion center for processing. Thus the centralized ordinal regression methods are inapplicable. In this paper, we formulate a distributed generalized ordered logit model for distributed ordinal regression. To estimate parameters in the model, a distributed constrained optimization formulation based on maximum likelihood methods is established. Then, we propose a projected gradient based algorithm to solve the optimization problem. We prove the consensus and the convergence of the proposed distributed algorithm. We also conduct numerical simulations on synthetic and real-world datasets. Simulation results show that the proposed distributed algorithm is comparable to the corresponding centralized algorithm. Even when the data label distribution among nodes is unbalanced, the proposed algorithm still has competitive performance.

Highlights

  • Classification, where data labels are restricted to a limited set of values, is a hot topic in machine learning and data mining

  • When data are collected distributedly by nodes of a network and are difficult or impossible to be transmitted to a fusion center for processing, the centralized ordinal regression methods are not applicable

  • We extend the generalized ordered logit model to distributed ordinal regression

Read more

Summary

INTRODUCTION

Classification, where data labels are restricted to a limited set of values, is a hot topic in machine learning and data mining. The cost-sensitive classification methods [12], [13], which penalize different misclassification errors differently, are adopted to better employ the ordering information It requires additional knowledge of appropriate measurements of label distances, which is usually unknown in ordinal regression problems. To estimate parameters in the model, we define the cost function as negative log-likelihood, and formulate a distributed constrained optimization problem. The objective is to find a set of linear mapping functions and thresholds with parameters {wq, bq}q=1,...,Q−1. B. DISTRIBUTED GENERALIZED ORDERED LOGIT ALGORITHM To solve problem (12), we use the penalty function method: min Jm(θm) +. When k is sufficiently large, since limk→+∞ ηk = 0, the penalty coefficient λmn becomes sufficiently large, which makes the solution of the distributed optimization problem (13) nearly equal to the solution of problem (12) At this time, the main purpose of the algorithm is to reach consensus. Because the calculation of each node can be performed in parallel, the computing time of the proposed dgologit algorithm can be shorter than that of the corresponding centralized algorithm in each iteration

THEORETICAL ANALYSIS
NUMERICAL SIMULATIONS
SYNTHETIC DATA
Findings
CONCLUSION AND DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.