Abstract
Minoru Kawahara and Hiroyuki Kawano 1 Data Processing Center, Kyoto University, Kyoto 6068501, JAPAN, kawahara@kudpc.kyoto-u.ac.jp, http://www.kudpc.kyoto-u.ac.jp/∼kawahara/index.html 2 Department of Systems Science, Kyoto University, Kyoto 6068501, JAPAN, kawano@i.kyoto-u.ac.jp, http://www.kuamp.kyoto-u.ac.jp/∼kawano/index.html In order to dissolve or ease retrieval difficulties on bibliographic databases, we have been developing bibliographic navigation system with the implementation of our proposed mining algorithms[1]. Our navigation system shows related keywords derived from the query which is inputed by a query user, and navigates query users to retrieve appropriate bibliographies. Although those thresholds that are used in the mining association algorithm are usually given by the system administrator, it is required methods to give such thresholds that can derive appropriate association rules for bibliographic navigation system. In this paper, we propose a method which specifies the optimal thresholds based on the ROC (Receiver Operating Characteristic) analysis[2] and evaluate the performance of the method on our practical navigation system. According to the bibliography [2], ROC graphs have long been used in signal detection theory to depict tradeoffs between hit rate and false alarm rate. ROC graphs illustrate the behavior of a classifier without regard to class distribution or error cost, and so they decouple classification performance from these factors. The ROC convex hull method is a method to compare multiple classifiers on an ROC graph and specify the optimal classifier which supplies the highest performance. ROC graph uses two parameters true positive rate TP and false positive rate FP as classifiers. If FP is plotted on the X axis and TP is plotted on the Y axis on a graph for several instances, then a curve is drawn and the curve, which is called as the ROC curve, drown nearer the point on which TP is higher and FP is lower, that is the most-northwest line, is better. Although ROC graph illustrates classification performance separated from class and cost, the ROC convex hull method can consider them. It is assumed that c(classification, class) is a two-place error cost function where c(n, P ) is the cost of a false negative error and c(y, N) is the cost of a false positive error, and p(P ) is the prior probability of a positive instance, so the prior probability of a negative instance is p(N) = 1 − p(P ). So the slope of an iso-performance line can be represented by p(N)/p(P ) · c(y, N)/c(n, P ). S. Arikawa, K. Furukawa (Eds.): DS’99, LNAI 1721, pp. 333–334, 1999. c © Springer-Verlag Berlin Heidelberg 1999 334 Minoru Kawahara and Hiroyuki Kawano Table 1. Minsups at Rerror = 145 and the average distances from the point (1, 0) on the ROC graph. “AllPos” means deriving all and “AllNeg” means deriving nothing. Category p(N)/p(P ) · 1/Rerror Optimal Minsup Minsup = 0.08 ROC Algorithm ROC distance ROC distance 1 0.0000 ∼ 0.2211 AllPos ∼ 0.02 0.8211 0.9477 2 0.2211 ∼ 0.7139 0.02 ∼ 0.04 0.8940 0.9725 3 0.7141 ∼ 2.2706 0.04 ∼ 0.25 0.9119 0.9008 4 2.2728 ∼ 7.1847 0.25 ∼ 0.40 0.9322 0.9857 5 7.2075 ∼ 22.565 0.40 ∼ 0.60 0.9926 0.9976 6 22.790 ∼ 69.076 0.60 0.9929 0.9968 7 71.235 ∼ 207.24 0.60 1.0262 1.0001 8 227.97 ∼ 569.93 AllNeg 1.0159 1.0000 9 759.91 ∼ 1139.9 AllNeg 1.0351 1.0000 1
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.