Abstract

The rapid advancements in data digitization, the most powerful inventions of learning methodologies in data collection and reduced cost of data storage further enabled the World Wide Web with immense amount of data at significant rate in all the key domains. The generated web data is non-scalable, high dimensional, widely distributed, heterogeneous, dynamic in nature and having useful insights, and thus, it evolved as big data. This situation creates inevitably increasing opportunities in extracting structured solutions from unstructured weblog data for the present big data researchers. Moreover, to provide value addition to any key domain and derive actionable knowledge for various applications, such as, web usage analysis for improvements in fraud detection, product analysis and customer segmentation, got the focus in big data era by the web analysts. To improve operational performance and to discover hidden insights accurately, a comprehensive process is required to investigate the web user usage behavior by analyzing big web data. Towards this, the authors concentrate on reviewing the techniques and technologies of web data collection and preparation for investigating web user usage behavior effectively. In the present paper, the researchers initially pay an attention to explore web log data preparation methods in the traditional approach. Later, the review emphasizes on Hadoop approach for big data preparation and processing. This approach able to concentrate comprehensively on both the stages: distributed data storage and parallel processing of weblog data and to leverage the strengths of techniques and technologies of individual stages. Moreover, the authors deliberately review the possible potential research paths that results in an improved methodologies for data storage and optimized processing speed in the era of big web data

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.