Using data to build a better EM: EM* for big data

Hasan Kurban,Mark Jenne,Mehmet M Dalkilic

doi:10.1007/s41060-017-0062-1

Abstract

Existing data mining techniques, more particularly iterative learning algorithms, become overwhelmed with big data. While parallelism is an obvious and, usually, necessary strategy, we observe that both (1) continually revisiting data and (2) visiting all data are two of the most prominent problems especially for iterative, unsupervised algorithms like expectation maximization algorithm for clustering (EM-T). Our strategy is to embed EM-T into a nonlinear hierarchical data structure (heap) that allows us to (1) separate data that needs to be revisited from data that does not and (2) narrow the iteration toward the data that is more difficult to cluster. We call this extended EM-T, EM*. We show our EM* algorithm outperform EM-T algorithm over large real-world and synthetic data sets. We lastly conclude with some theoretical underpinnings that explain why EM* is successful.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using data to build a better EM: EM* for big data

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics

Lead the way for us

Journal: International Journal of Data Science and Analytics	Publication Date: Jul 5, 2017
Citations: 11

Similar Papers

EM*: An EM Algorithm for Big Data
Hasan Kurban ... Mehmet M Dalkilic
-
Hasan Kurban, et. al.Hasan Kurban ... Mehmet M Dalkilic
01 Oct 2016
01 Oct 2016

A novel approach to optimization of iterative machine learning algorithms: Over heap structure
Hasan Kurban ... Mehmet M Dalkilic
-
Hasan Kurban, et. al.Hasan Kurban ... Mehmet M Dalkilic
01 Dec 2017
01 Dec 2017

Case Study
Hasan Kurban ... Mehmet M Dalkilic
-
Hasan Kurban, et. al.Hasan Kurban ... Mehmet M Dalkilic
05 Dec 2017
05 Dec 2017

Improving expectation maximization algorithm over stellar data
Hasan Kurban ... Mark Jenne
-
Hasan Kurban, et. al.Hasan Kurban ... Mark Jenne
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using data to build a better EM: EM* for big data

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics