Abstract
Ranked set sampling and some of its variants have been applied successfully in different areas of applications such as industrial statistics, economics, environmental and ecological studies, biostatistics, and statistical genetics. Ranked set sampling is a sampling method that more efficient than simple random sampling. Also, it is well known that Fisher information of a ranked set sample (RSS) is larger than Fisher information of a simple random sample (SRS) of the same size about the unknown parameter of the underlying distribution in parametric inference. In this paper, we consider the Farlie-Gumbel-Morgenstern (FGM) family and study the information measures such as Shannon’s entropy, Rényi entropy, mutual information, and Kullback-Leibler (KL) information of RSS data. Also, we investigate their properties and compare them with a SRS data.
Highlights
As far as the methods as concerned Synthetic Minority Oversampling Technique (SMOTE) SVM has the highest performance compared to C SVM, Two Cost SVM (TC SVM) and SVM-RU
An alternative cost sensitive SVM (TC SVM) strategy was used, since classic SVMs are proved inappropriate to deal with imbalanced datasets
We investigated the effect of incorporating the TC SVM on a learned SVM model using a medical dataset
Summary
Introduction and motivationSupport vector machines (SVMs), a powerful machine learning technique, were introduced by Vapnik (Vapnik (1995) and Cortes and Vapnik (1995), Burges (1998), Cristianinio and Shawe-Taylor (2000), Scholkopf and Smola (2001)) and successfully applied in various realworld problems, ranging from image retrieval (Tong and Chang (2001)) and handwriting recognition (Cortes (1995)) to face detection (Osuna et al (1997)) and speaker identification (Schmidt, M.(1996)). Support Vector Machines Classification on Class Imbalanced Data. The issue concerning imbalanced data is recognized as a crucial problem in machine learning community (Chawla, et al (2004)). In these cases, classifiers tend to be overpowered by the majority class and ignore the minority examples assuming an equal misclassification error. Numerous recent works, including preprocessing and algorithmic methods have been proposed and dealt with the crucial problem of imbalanced data These techniques can be sorted into two different categories: preprocessing the data by oversampling the minority instances or undersampling the majority instances and algorithmic methods including cost-sensitive learning (Batuvita and Palade (2013)). The soft margin optimization problem (Cortes and Vapnik (1995)) can be formulated as: minimize
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.