Abstract
Software systems for social data mining provide algorithms and tools for extracting useful knowledge from user-generated social media data. ParSoDA (Parallel Social Data Analytics) is a high-level library for developing parallel data mining applications based on the extraction of useful knowledge from large data set gathered from social media. The library aims at reducing the programming skills needed for implementing scalable social data analysis applications. To reach this goal, ParSoDA defines a general structure for a social data analysis application that includes a number of configurable steps and provides a predefined (but extensible) set of functions that can be used for each step. User applications based on the ParSoDA library can be run on both Apache Hadoop and Spark clusters. The paper describes the ParSoDA library and presents two social data analysis applications to assess its usability and scalability. Concerning usability, we compare the programming effort required for coding a social media application using versus not using the ParSoDA library. The comparison shows that ParSoDA leads to a drastic reduction (i.e., about 65%) of lines of code, since the programmer only has to implement the application logic without worrying about configuring the environment and related classes. About scalability, using a cluster with 300 cores and 1.2 TB of RAM, ParSoDA is able to reduce the execution time of such applications up to 85%, compared to a cluster with 25 cores and 100 GB of RAM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.