Review on modern Data Preprocessing techniques in Web usage mining (WUM)

P Sukumar,L Robert,S Yuvaraj

doi:10.1109/csitss.2016.7779441

Abstract

The web contains huge amount of data that is increasing in volume and dimension day by day. Data mining applications that make use of Web data is referred as Web mining. Web mining is one of the hot topics in the field of data mining. Web mining is classified into three types based on extracting knowledge. They are Web Structure mining, Web content mining the Web usage mining. Web usage mining process can be divided into three interdependent stages: data preprocessing, pattern discovery and pattern analysis. This paper is mainly related to web usage mining. The contribution of this paper is based on the investigation of data preprocessing and is used to determine the effectiveness of the algorithms, its limitations, and their stands are verified. Various preprocessing algorithms and its heuristics are applied and examined by implemented using programming languages. Data preprocessing algorithms are used to parse the raw log files that involve splitting of the log files and then cleansed to obtain superior quality of data. Based on this data, the unique users are identified which in turn helps to identify user sessions.

Full Text