Abstract

business concern is knowing your customer. One legacy carried into the present from the earliest NCSA web servers is web server logs. While there are more powerful user tracking techniques, such as requiring logins or storing cookies, server logs remain a powerful tool in helping understand customer activity on a web site, and are the only tool when logins are not desirable or cookies are blocked by browsers or firewalls. This paper details the possibilities and pitfalls in using web server logs to understand customer behavior on a web site. Described here is the information recorded by the server, and what legitimate inferences can be made from that data. Special emphasis is given to case studies that demonstrate the interactions of the protocols HTTP and HTML, and how weaknesses in the current specification can confound the recorded data and lead to an incorrect analysis. customer behavior that can be learned from them. Before delving into the precise details of server logs, the first case study will illustrate why a business might care about server logs. This example also strongly hints that a cursory analysis is often insufficient, a theme that will be pursued throughout the paper. All data used here are from actual client e-commerce sites, but for confidentiality the names of the sites will be elided, and only relative traffic levels presented. A downturn in monthly sales was noted at an e-commerce web site. A quick analysis of the server logs indicated that the site's traffic had also fallen. Looking deeper into the data revealed that the source of most of the site's traffic was from customers clicking on links that pointed to the e-commerce site from the company's main web site. This pattern of behavior was detected using the referer (sic) field recorded in the server logs. Careful review of the traffic referred from the main site revealed a radical decrease. Figure 1 visualizes the drop in customers directed from the main web site utilizing the contents of the referer field. The horizontal dimension is time and the height of each bar representing a week's worth of customers referred to the e-commerce site. Further inquiry revealed that the main site had undergone a major redesign, which coincided with the first precipitous drop. We were initially unsure why the traffic continued to fall, but hypothesized that a continuous process of incremental redesign during this period may have contributed to the continuing decline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call