Abstract
Abstract. In the big data era, real-time log data analysis is becoming the important demands for Internet enterprises, behind these data hiding big value. Map Word, the National Platform for Common GeoSpatial Information Services in China, is a one-stop website providing geospatial information services to personal users, enterprises, professional agencies and governments. After 7 years’ development of the platform, the traffic increased significantly, reaching 200 million service requests per day. But due to the lack of effective analysis and processing technology, the log data did not play its value, which lead to disconnection between the top-level design of national common geospatial information services and the actual demands of the users in a way. Now, the geospatial information service in China is trying to shift from the data production driven to the demand driven actively, and how to understand the demands of users became one imperious issue for research. In addition, the access behaviour of group users to common geospatial information services has a social nature and there is a certain group access behaviour pattern. This mode has high intensity of access aggregation and spontaneity, and determines the demand of common geospatial information services for cloud computing resources. Parts of the above demands can be analysed from the log data. Therefore, how to develop a log analysis system for unified real-time collection, real-time analysis, centralized storage, and graphical display is the key to support the demands. Flume, Kafka, Storm, Redis and HBase have been integrated to design and implement a distributed real-time log analysis system supporting online and offline log analysis. The system is composed of log collection module, asynchronous communication module, real time analysis and calculation module, data cashing and storage module, and visualization module. The system was release and integrated with Map World in June 2017 successfully, and the implementation of the system indicates that it can efficiently solve the problems of real-time log data collection, real-time analysis, real-time storage, real-time query, massive data storage, offline analysis, etc. It played an important role in map data update, policy making, product decisions, online server load prediction, resource allocation optimization, Internet security improvement and operation funs evaluation.
Highlights
As the unstructured records produced during the operation of the software systems, logs are a management tool to record the behaviours of systems and network users, which describe the behaviours on application services and user interaction (Qu G., 2016)
To deal with logs with the characteristics of huge data volume, high complexity and high demand of real time, it puts forward higher requests to the capability of real-time computing and massive storage in the whole process
In the era of cloud computing and big data, the geospecial information service is gradually transforming from data production driven to user demands driven, and the core issue for this transformation is how to understand the users’ demands by extracting useful information form the log data promptly and effectively (Wu H., Li R., Zhou Z., Jiang J., & Gui Z.,2015.)
Summary
As the unstructured records produced during the operation of the software systems, logs are a management tool to record the behaviours of systems and network users, which describe the behaviours on application services and user interaction (Qu G., 2016). Statistic and analysis, it can help product managers and decision makers to find the potential production problems so as to promote replace of products, improve user experience and optimize product operation. The study on the real-time streaming data processing technology to the massive log data has been applied widely in the Internet, such as real-time monitoring, real-time recommendation, and realtime statistics (Liu F., 2017). The Real-time collection and analysis of massive logs has become the important big data service in the Internet firms, which is the main method to understand user behaviour deeply, evaluate marketing effectiveness, optimize the product experience, and improve operation efficiency. With the deep and wide application of Map World, the users of the platform increase rapidly, it has over 200 million log records of page and service requests everyday, which presents challenges to the existing log data collection and analysis system. Neither the storage mode nor the calculation efficiency, the traditional log collection and analysis technology
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.