Abstract

The main objectives of this research work are to report detailed empirical studies on sequential and parallel algorithms for diverse clustering tasks executed on very large social network datasets using memory efficient out-of-core approaches. We evaluate the spark implementation for R on Cloudera using the data from social media review datasets like k-means and hierarchical clustering to rank these algorithms. This implementation leverages the YouTube dataset from UCI Machine Learning Repository. Our goal is to compare a few algorithms, so we can know exactly how accurately these models are performing. Ultimately we want to deal with testing and ranking clustering method, and mining and finally clustering massive amounts of unstructured data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call