Abstract
Analysis of online user-generated content is receiving attention for its wide applications from both academic researchers and industry stakeholders. In this pilot study, we address common Big Data problems of time constraints and memory costs involved with using standard single-machine hardware and software. A novel Big Data processing framework is proposed to investigate a niche subset of user-generated popular culture content on Douban, a well-known Chinese-language online social network. Huge data samples are harvested via an asynchronous scraping crawler. We also discuss how to manipulate heterogeneous features from raw samples to facilitate analysis of various film details, review comments, and user profiles on Douban with specific regard to a wave of South Korean films (2003–2014), which have increased in popularity among Chinese film fans. In addition, an improved Apriori algorithm based on MapReduce is proposed for content-mining functions. An exploratory simulation of results demonstrates the flexibility and applicability of the proposed framework for extracting relevant information from complex social media data, knowledge which can in turn be extended beyond this niche dataset and used to inform producers and distributors of films, television shows, and other digital media content.
Highlights
The last decade has witnessed the dramatic expansion of online social networks at the global level
We introduce the Douban-Learning framework in “Experimental results”, where three major modules are discussed in terms of data harvesting, feature generation and content mining
Experimental results This section presents experimental results following application of the improved Apriori algorithm to features extracted for content mining
Summary
The last decade has witnessed the dramatic expansion of online social networks (hereafter OSNs) at the global level. More and more people are employing OSNs in their dayto-day lives to access information, express opinions and share experiences with their peers. Massive volumes of content are generated every day from numerous social media channels. A significant proportion of online content is associated with the film domain, as many OSNs (such as Rotten Tomatoes, FilmCrave and Twitter) provide cinema fans with convenient mechanisms for posting and sharing their opinions or comments about movies online. Prospective audiences are increasingly inclined to rely on online reviews to make their own viewing choices, as well as a list of films upon which they might comment (but not necessarily see).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.