Abstract. In the big data era, real-time log data analysis is becoming the important demands for Internet enterprises, behind these data hiding big value. Map Word, the National Platform for Common GeoSpatial Information Services in China, is a one-stop website providing geospatial information services to personal users, enterprises, professional agencies and governments. After 7 years’ development of the platform, the traffic increased significantly, reaching 200 million service requests per day. But due to the lack of effective analysis and processing technology, the log data did not play its value, which lead to disconnection between the top-level design of national common geospatial information services and the actual demands of the users in a way. Now, the geospatial information service in China is trying to shift from the data production driven to the demand driven actively, and how to understand the demands of users became one imperious issue for research. In addition, the access behaviour of group users to common geospatial information services has a social nature and there is a certain group access behaviour pattern. This mode has high intensity of access aggregation and spontaneity, and determines the demand of common geospatial information services for cloud computing resources. Parts of the above demands can be analysed from the log data. Therefore, how to develop a log analysis system for unified real-time collection, real-time analysis, centralized storage, and graphical display is the key to support the demands. Flume, Kafka, Storm, Redis and HBase have been integrated to design and implement a distributed real-time log analysis system supporting online and offline log analysis. The system is composed of log collection module, asynchronous communication module, real time analysis and calculation module, data cashing and storage module, and visualization module. The system was release and integrated with Map World in June 2017 successfully, and the implementation of the system indicates that it can efficiently solve the problems of real-time log data collection, real-time analysis, real-time storage, real-time query, massive data storage, offline analysis, etc. It played an important role in map data update, policy making, product decisions, online server load prediction, resource allocation optimization, Internet security improvement and operation funs evaluation.
Read full abstract