Abstract

With advances in mobile technology and mobile Internet applications, smart mobile devices, such as smartphones and tablets, have become increasingly popular, and the number of Internet users worldwide continues to grow. In the Internet era, the amount of data is growing exponentially and companies must be able to harness the value of the vast amount of data. Data platforms must integrate massive amounts of data collection, storage, computation and analysis to meet these opportunities and challenges. In this study, the log data of Internet users browsing websites are analyzed and the technologies used in the platform are briefly described. Finally, a draft platform for analyzing offline Internet user behavior data is proposed, taking into account the current common needs of different industries, while incorporating some innovations. Three modules are designed and implemented: data collection, data warehouse and data visualization. The user's data is mainly collected by the data collection module. The data warehouse is mainly responsible for cleaning, modeling and analyzing the data. As part of the data visualization module, the result data from the ADS layer is used as a template to create tables in MySQL, export the results to MySQL periodically using the Sqoop tool, and visualize the data using the data visualization tool. With Flume, Kafka and Sqoop, HDFS is used as the data storage framework, Hive is used as the storage tool, and Spark is used as the Hive computation engine to build the platform in a large context to analyze Internet user behavior.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call