Abstract

Feature selection (FS) plays a crucial role in machine learning to build a robust model for either learning or classification from a large amount of data. Among feature selection techniques, the Relief algorithm is one of the most common due to its simplicity and effectiveness. The performance of the Relief algorithm, however, could be dramatically affected by the consistency of the data patterns. For instance, Relief-F could become less accurate in the presence of noise. The accuracy would decrease further if an outlier sample was included in the dataset. Therefore, it is very important to select the samples to be included in the dataset carefully. This paper presents an effort to improve the effectiveness of Relief algorithm by filtering samples before selecting features. This method is termed Sample Filtering Relief Algorithm (SFRA). The main idea of this method is to discriminate outlier samples out of the main pattern using self organizing map (SOM) and then proceed with feature selection using the Relief algorithm. We have tested SFRA with a gene expression dataset of interferon-?(IFN-?) response of Hepatitis B patients that contains outlier data. SFRA could successfully remove outlier samples that have been verified by visual inspection by experts. Also, it has better accuracy in separating the relevant and irrelevant features than other feature selection methods considered.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call