A Novel Similarity-Based Algorithm for Supervised Binary Classification: Sandalwood Odor Application

Walid Cherif,Abdellah Madani,Mohamed Kissi

doi:10.2139/ssrn.3186282

Abstract

Recent years have brought big advances in the field of Data mining; this is due to the immense advances in technologies which facilitate the analysis of data. In particular, binary classification techniques are a subdomain of Data Mining which is used to classify data into two classes according to desired criteria. Different algorithms have been developed and they can be categorized into statistical and machine learning. Each category has its own limits. In this paper, a set of molecules is chosen to illustrate the proposed approach of binary classification. The considered molecules are described by means of size, shape, functionality and other expert descriptors. This paper defines new measure functions to quantify the similarities between the molecules and then combines them in a new approach which differs from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with a f-measure exceeding 70% on this molecules Dataset.

Full Text