Web site keywords: A methodology for improving gradually the web site text content

Juan D Velasquez

doi:10.3233/ida-2012-0526

Abstract

The construction of a web site is a great challenge that integrates different elements such as the hyperlink structure, colors, pictures, movies and textual contents. In the latter, the correct textual content can be the key to attracting users to visit the site. In fact, many users visit a web site by using a web search engine such as, Google or Yahoo!, and continue exploring the site if it contains the information that they are looking for. In this paper, a methodology to extract the main words in a static web site is proposed. Furthermore, one of the key elements in this methodology is to determine which pages in a web site can further attract the users attention when they are browsing the site. These words are called web site keywords and by using them in the site textual content, significant improvements, from the point of view of the user, can be achieved. A web user's browsing behaviour can be classified in two categories: those of amateurs and experienced. The former is a user with little or no experience in using web-based systems. Their browsing behaviour is normally erratic and it can take them a considerable amount of time to find what they are looking for. The latter is a user with a greater amount of experience with web-based systems whose behaviour is more controlled and purpose driven, and thus takes them less time in determining whether the site contains worthwhile information. What is important, regarding the experienced web users is that there is a correlation between the amount of time spent on a webpage during a session and the extent to which they are interested in the page content. By using this characteristic, a feature vector is created in relation to the time spent on each page during a user's session. The described vectors are the input for two clustering algorithms: SOFM and K-means, which enables the extraction of significant patterns about users with similar or identical browsing behaviour and content preferences. Then, these patterns form the basis in identification of the web site keywords. In order to validate the proposed methodology, web data originated in a complex static web site belonging to a Chilean bank was used. From the clusters identified, a set of web site keywords were identified and their utility was tested on a group of real users, thus illustrating the effectiveness of the proposed methodology.

Full Text