데이터의 공간적 분포를 고려한 퍼지 이산화와 특징선택에의 응용

Chang-Sik Son,Hyoung-Seob Park,A-Mi Shin,In-Hee Lee,Yoon-Nyun Kim,Hee-Joon Park

doi:10.5391/jkiis.2010.20.2.165

Abstract

임상 데이터마이닝에서 최적의 특징 집합을 선택하는 것은 주어진 데이터로부터 생성된 모델의 복잡성을 줄일 뿐만 아니라 유용성을 향상시키는 데에 매우 중요하고, 선택된 특징들의 임계값은 질병의 감별진단을 위해 임상 전문가의 결정기준으로 사용된다. 본 논문에서는 데이터의 공간적인 분포, 즉 중첩영역에서 중복 속성값을 포함하는 데이터의 분리성 정도를 평가함으로써 연속형 속성을 가진 데이터에 대한 퍼지 이산화기법을 제안한다. 제안된 방법에서 중복 속성값의 가중치 평균값은 각 특징의 임계값(즉 경계값)을 결정하기 위해서 사용되었고, 러프집합은 전체 특징들 중에서 중요특징들의 집합을 선택하기 위해서 이용하였다. 제안된 방법의 타당성을 검증하기 위해 호흡곤란을 주호소로 내원한 668명의 환자 데이터를 근거로 3가지 이산화방법과 제안된 이산화방법에 대한 실험을 수행하였다. 실험결과, 퍼지분할을 기반으로 한 이산화방법이 하드분할을 기반으로 한 이산화방법에 비해서 평균 분류정확도와 G-mean 성능에서 보다 좋은 결과를 제공함을 확인하였다. In clinical data minig, choosing the optimal subset of features is such important, not only to reduce the computational complexity but also to improve the usefulness of the model constructed from the given data. Moreover the threshold values (i.e., cut-off points) of selected features are used in a clinical decision criteria of experts for differential diagnosis of diseases. In this paper, we propose a fuzzy discretization approach, which is evaluated by measuring the degree of separation of redundant attribute values in overlapping region, based on spatial distribution of data with continuous attributes. The weighted average of the redundant attribute values is then used to determine the threshold value for each feature and rough set theory is utilized to select a subset of relevant features from the overall features. To verify the validity of the proposed method, we compared experimental results, which applied to classification problem using 668 patients with a chief complaint of dyspnea, based on three discretization methods (i.e., equal-width, equal-frequency, and entropy-based) and proposed discretization method. From the experimental results, we confirm that the discretization methods with fuzzy partition give better results in two evaluation measures, average classification accuracy and G-mean, than those with hard partition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

데이터의 공간적 분포를 고려한 퍼지 이산화와 특징선택에의 응용

Abstract

Talk to us

Similar Papers

More From: Journal of Korean Institute of Intelligent Systems

Lead the way for us

Journal: Journal of Korean Institute of Intelligent Systems	Publication Date: Apr 25, 2010
Citations: 2

Similar Papers

Discretization of Continuous Interval-Valued Attributes in Rough Set Theory and its Application
Guan Xin ... Yi Xiao
-
Guan Xin, et. al.Guan Xin ... Yi Xiao
01 Jan 2007
01 Jan 2007

Study on water quality analysis and early-warning technology based on rough set and evidence theory
...
-
, et. al. ...
20 Nov 2012
20 Nov 2012

A Hybrid Feature Selection Scheme Based on Local Compactness and Global Separability for Improving Roller Bearing Diagnostic Performance
M M Manjurul Islam ... Jong-Myon Kim
-
M M Manjurul Islam, et. al.M M Manjurul Islam ... Jong-Myon Kim
27 Dec 2016
27 Dec 2016

R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification
Rubul Kumar Bania ... Anindya Halder
Artificial Intelligence in Medicine | VOL. 114
Rubul Kumar Bania, et. al.Rubul Kumar Bania ... Anindya Halder
06 Mar 2021
Artificial Intelligence in Medicine | VOL. 114

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

데이터의 공간적 분포를 고려한 퍼지 이산화와 특징선택에의 응용

Abstract

Talk to us

Similar Papers

More From: Journal of Korean Institute of Intelligent Systems