Abstract

Streaming data is very huge in size and it arrives continuously in a rapid manner. It may vary from time to time. Scalability, Continuous Availability, Workload Diversity, Data Security and Manageability are the challenges in Big data. Due to the huge size of data, it is difficult to analyze and it takes more time to complete. Rather than analyzing the entire streaming dataset, sampling provides an alternate solution to analyze in an efficient manner and thereby minimizing the computation time. Sampling should represent the properties of the entire dataset. Many sampling techniques such as Reservoir sampling(RS) is used to extract the sample. In our proposed work, Twitter dataset is extracted via Twitter Application Programming Interface(API) and analyze the dataset using Sentiment Analysis (SA) technique. SA technique is used to find out the polarity of the tweets and is applied for the sample dataset which are extracted from the complete dataset using Reservoir Sampling techniques. Further, the obtained results are analyzed and perceived that sampling technique will be precise for Twitter dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call