SVM과 로짓회귀분석을 이용한 흥미있는 웹페이지 예측

Dohong Jeon,Hyoungrae Kim

doi:10.9708/jksci.2015.20.3.047

Abstract

흥미 있는 웹페이지의 자동화된 탐색은 다양한 응용 분야에 활용될 수 있다. 웹페이지에 대한 사용자의 흥미는 판단하는 것은 사용자의 행동을 관찰함으로 자동화가 가능하다. 흥미 있는 웹페이지를 구분하는 작업은 판별 문제에 속하며, 우리는 실증을 위해 화이트 박스의 학습 방법(로짓회귀분석, 지지기반학습)을 선택한다. 실험 결과는 다음을 나타내었다. (1) 고정효과 로짓회귀분석, polynomial 과 radial 커널을 이용한 고정효과 지지기반학습은 선형 커널보다 높은 성능을 보였다. (2) 개인화가 모델 성능을 향상시킴에 있어 주요한 이슈이다. (3) 사용자에게 웹페이지에 대항 흥미를 물을 때, 구간은 단순히 예/아니 도 충분할 수 있다. (4) 웹페이지에 머문 기간이 매초 증가할 때마다 성공확률은 1.004배 증가하며, 하지만 스크롤바 클릭 수 (p=0.56) 와 마우스 클릭 수 (p=0.36) 지표는 흥미와 통계적으로 유의한 관계를 가지지 않았다. Automated detection of interesting web pages could be used in many different application domains. Determining a user's interesting web pages can be performed implicitly by observing the user's behavior. The task of distinguishing interesting web pages belongs to a classification problem, and we choose white box learning methods (fixed effect logit regression and support vector machine) to test empirically. The result indicated that (1) fixed effect logit regression, fixed effect SVMs with both polynomial and radial basis kernels showed higher performance than the linear kernel model, (2) a personalization is a critical issue for improving the performance of a model, (3) when asking a user explicit grading of web pages, the scale could be as simple as yes/no answer, (4) every second the duration in a web page increases, the ratio of the probability to be interesting increased 1.004 times, but the number of scrollbar clicks (p=0.56) and the number of mouse clicks (p=0.36) did not have statistically significant relations with the interest.

Full Text