Abstract

Securing e-commerce sites has become a necessity as they process critical and sensitive data to customers and organizations. When a customer navigates through an e-commerce site his/her clicks are recorded in web log file. Analyzing these log files using data mining reveal many interesting patterns. These results are used in many different applications and recently in detecting attacks on web. In order to improve quality of data and consequently the mining results data in log files need first to be preprocessed. In this paper, we will discuss how different web log files with different formats will be combined together in one unified format using XML in order to track and extract more attacks. And because log files usually contain noisy and ambiguous data this paper will show how data will be preprocessed before applying mining process in order to detect attacks. We will also discuss the difference between log preprocessing for web intrusion and for web usage mining

Highlights

  • The destruction of trust in e-commerce applications may cause business operators and clients to forgo use of the Internet for and revert back to traditional methods of doing business

  • Web usage mining and web intrusion detection have different targets and they differ in the type of data needed for mining process

  • Many works have been developed and different tools are available to preprocess web log files for web usage mining

Read more

Summary

Introduction

The destruction of trust in e-commerce applications may cause business operators and clients to forgo use of the Internet for and revert back to traditional methods of doing business. One technique to detect web attacks is to analyze web server log files This information includes user IP, the resource user requests, what type of protocol used and others (see section II) Because these log files contain information about user access behavior on a web site, analyzing these files can reveal patterns of web attacks. The purposes of web mining (analyzing log files using data mining techniques) is to identify potential users for e-commerce, enhance quality of services provided to end users, improve web server performance and others Many works have been devoted to preprocess data in log file for web usage mining. Web usage mining and web intrusion detection have different targets and they differ in the type of data needed for mining process.

Web logs
Referrer log file
NCSA format
W3C extended format
IIS format
Related Work
Preprocessing process
Data Integration
Extracts time part from str
Data Cleansing
User Identification
Session Identification
Detected attacks from log files
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call