흥미도 측도 관점에서 상대적 인과 강도의 고찰

Hee Chang Park

doi:10.7465/jkdi.2017.28.1.49

Abstract

빅 데이터를 분석하기 위한 기법 중에서 연관성 규칙은 여러 가지 연관성 평가 기준을 이용하여 항목들 간에 연관성 유무를 탐색하는 기법이다. 이러한 연관성 규칙 기법은 규칙의 생성 방향에 따라 정과 부, 그리고 역의 연관성 규칙 등이 있다. 본 논문에서는 여러 가지 상대적 인과 강도를 흥미도 측도의 관점에서 어떤 유형의 연관성 규칙에 적용 가능한 지를 탐색하는 동시에 기존의 기본적인 평가측도 증에서 여러 가지 유형의 신뢰도들과의 관계를 규명하고자 하였다. 그 결과, 후항변수가 발생할 비율이 0.5 이상이면 Good이 제안한 측도 (<TEX>$RCS_{IJ1}$</TEX>)가 Lewis가 제안한 측도 (<TEX>$RCS_{LR1}$</TEX>) 보다 값의 변화폭이 더 크므로 <TEX>$RCS_{IJ1}$</TEX>이 더 바람직한 측도가 되며, 그 비율이 0.5 미만이면 <TEX>$RCS_{LR1}$</TEX>이 더 바람직하다고 할 수 있다. Among the techniques for analyzing big data, the association rule mining is a technique for searching for relationship between some items using various relevance evaluation criteria. This associative rule scheme is based on the direction of rule creation, and there are positive, negative, and inverse association rules. The purpose of this paper is to investigate the applicability of various types of relatively causal strength measures to the types of association rules from the point of view of interestingness measure. We also clarify the relationship between various types of confidence measures. As a result, if the rate of occurrence of the posterior item is more than 0.5, the first measure (<TEX>$RCS_{IJ1}$</TEX>) proposed by Good (1961) is more preferable to the first measure (<TEX>$RCS_{LR1}$</TEX>) proposed by Lewis (1986) because the variation of the value is larger than that of <TEX>$RCS_{LR1}$</TEX>, and if the ratio is less than 0.5, <TEX>$RCS_{LR1}$</TEX> is more preferable to <TEX>$RCS_{IJ1}$</TEX>.

Full Text