A distributed density estimation algorithm and its application to naive Bayes classification

Ahmad Khajenezhad,Mohammad Ali Bashiri,Hamid Beigy

doi:10.1016/j.asoc.2020.106837

Abstract

We consider the problem of learning a density function from observations of an unknown underlying model in a distributed setting, where the observations are partitioned into different sites. Applying commonly used density estimation methods such as Gaussian Mixture Model (GMM) or Kernel Density Estimation (KDE) to distributed data leads to an extensive amount of communication. A familiar approach to address this issue is to sample a small subset of data and collect them into a central node to run the density estimation algorithms on them. In this paper, we follow an alternative to the sub-sampling approach by proposing the nested Log-Poly model. This model provides an accurate density estimation from a small sized statistic of the entire data. In distributed settings, it transfers the small sized statistics from the client nodes to a central node. The estimation process is then run in the central node. The proposed model can be used in different learning tasks such as classification in supervised learning and clustering in unsupervised learning. However, the properties of nested Log-Poly make it a suitable model for one-dimensional density estimations in the distributed settings. This makes Log-Poly a good choice for naive Bayes classifier, where one-dimensional density estimation is required for every feature conditioned on the class label. We provide a theoretical analysis of the efficiency of our model in estimating a wide range of probability density functions. Our experiments show that nested Log-Poly outperforms the state of the art density estimators on several synthetic datasets. We compare the accuracy and the communication load of naive Bayes classifier using nested Log-Poly and other related density estimators on several real datasets. The experimental outcomes depict that nested Log-Poly has less communication load, while maintaining a competitive classification accuracy compared to similar methods that use the entire data. Moreover, we present a comprehensive comparison between nested Log-Poly and validated KDE with sub-sampling, in terms of the number of communicated variables and the number of bytes transferred between the clients and the central node. Nested Log-Poly provides comparable accuracy with the validated KDE with sub-sampling, while communicating fewer variables. However, our method needs to compute and transmit the variables with a high precision in order to accurately capture the details of the underlying distributions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A distributed density estimation algorithm and its application to naive Bayes classification

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing

Lead the way for us

Journal: Applied Soft Computing	Publication Date: Oct 28, 2020
Citations: 15

Similar Papers

DEMass: a new density estimator for big data
Kai Ming Ting ... Takashi Washio
Knowledge and Information Systems | VOL. 35
Kai Ming Ting, et. al.Kai Ming Ting ... Takashi Washio
09 Feb 2013
Knowledge and Information Systems | VOL. 35

Discussion of the Paper by Hall and Johnstone
-
Journal of the Royal Statistical Society Series B: Statistical Methodology | VOL. 54
--
01 Jan 1992
Journal of the Royal Statistical Society Series B: Statistical Methodology | VOL. 54

Density Estimation Based on Mass
Kai Ming Ting ... Jonathan R Wells
-
Kai Ming Ting, et. al.Kai Ming Ting ... Jonathan R Wells
01 Dec 2011
01 Dec 2011

Distortion corrected kernel density estimator on Riemannian manifolds
Fan Cheng ... Anastasios Panagiotelis
Journal of Computational and Graphical Statistics | VOL. just-accepted
Fan Cheng, et. al.Fan Cheng ... Anastasios Panagiotelis
17 Oct 2024
Journal of Computational and Graphical Statistics | VOL. just-accepted

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A distributed density estimation algorithm and its application to naive Bayes classification

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing