Research on bioinformatics data classification method based on support vector machine

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Research on bioinformatics data classification method based on support vector machine

Similar Papers
  • Research Article
  • Cite Count Icon 21
  • 10.4236/iim.2010.26043
Intelligent Optimization Methods for High-Dimensional Data Classification for Support Vector Machines
  • Jan 1, 2010
  • Intelligent Information Management
  • Sheng Ding + 1 more

Support vector machine (SVM) is a popular pattern classification method with many application areas. SVM shows its outstanding performance in high-dimensional data classification. In the process of classification, SVM kernel parameter setting during the SVM training procedure, along with the feature selection significantly influences the classification accuracy. This paper proposes two novel intelligent optimization methods, which simultaneously determines the parameter values while discovering a subset of features to increase SVM classification accuracy. The study focuses on two evolutionary computing approaches to optimize the parameters of SVM: particle swarm optimization (PSO) and genetic algorithm (GA). And we combine above the two intelligent optimization methods with SVM to choose appropriate subset features and SVM parameters, which are termed GA-FSSVM (Genetic Algorithm-Feature Selection Support Vector Machines) and PSO-FSSVM(Particle Swarm Optimization-Feature Selection Support Vector Machines) models. Experimental results demonstrate that the classification accuracy by our proposed methods outperforms traditional grid search approach and many other approaches. Moreover, the result indicates that PSO-FSSVM can obtain higher classification accuracy than GA-FSSVM classification for hyperspectral data.

  • Research Article
  • Cite Count Icon 1
  • 10.13088/jiis.2012.18.2.029
Ensemble Learning with Support Vector Machines for Bond Rating
  • Jan 1, 2012
  • Myoung-Jong Kim

Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

  • Conference Article
  • Cite Count Icon 3
  • 10.1063/1.5012168
Fuzzy support vector machine for microarray imbalanced data classification
  • Jan 1, 2017
  • Faroh Ladayya + 2 more

DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 231
  • 10.33889/ijmems.2020.5.4.052
Detection of coronavirus Disease (COVID-19) based on Deep Features and Support Vector Machine
  • Aug 1, 2020
  • International Journal of Mathematical, Engineering and Management Sciences
  • Prabira Kumar Sethy + 3 more

The detection of coronavirus (COVID-19) is now a critical task for the medical practitioner. The coronavirus spread so quickly between people and approaches 100,000 people worldwide. In this consequence, it is very much essential to identify the infected people so that prevention of spread can be taken. In this paper, the deep feature plus support vector machine (SVM) based methodology is suggested for detection of coronavirus infected patient using X-ray images. For classification, SVM is used instead of deep learning based classifier, as the later one need a large dataset for training and validation. The deep features from the fully connected layer of CNN model are extracted and fed to SVM for classification purpose. The SVM classifies the corona affected X-ray images from others. The methodology consists of three categories of Xray images, i.e., COVID-19, pneumonia and normal. The method is beneficial for the medical practitioner to classify among the COVID-19 patient, pneumonia patient and healthy people. SVM is evaluated for detection of COVID-19 using the deep features of different 13 number of CNN models. The SVM produced the best results using the deep feature of ResNet50. The classification model, i.e. ResNet50 plus SVM achieved accuracy, sensitivity, FPR and F1 score of 95.33%,95.33%,2.33% and 95.34% respectively for detection of COVID-19 (ignoring SARS, MERS and ARDS). Again, the highest accuracy achieved by ResNet50 plus SVM is 98.66%. The result is based on the Xray images available in the repository of GitHub and Kaggle. As the data set is in hundreds, the classification based on SVM is more robust compared to the transfer learning approach. Also, a comparison analysis of other traditional classification method is carried out. The traditional methods are local binary patterns (LBP) plus SVM, histogram of oriented gradients (HOG) plus SVM and Gray Level Co-occurrence Matrix (GLCM) plus SVM. In traditional image classification method, LBP plus SVM achieved 93.4% of accuracy.

  • Conference Article
  • Cite Count Icon 12
  • 10.1109/bigcomp.2014.6741445
Ensemble method for classification of high-dimensional data
  • Jan 1, 2014
  • Yongjun Piao + 3 more

Ensemble methods, also known as classifier combination were often used to improve the performance of classification. Growing problem of data dimensionality makes a various challenges for supervised learning. Generally used classification methods such as decision tree, neural network and support vector machines were difficult to be directly applied on high-dimensional datasets. In this paper, we proposed an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partition of redundant features. In our method, the redundancy of features was considered to divide the original feature space. Then, each generated feature subset was trained by support vector machine and the results of each classifier were combined by the majority voting method. The efficiency and effectiveness of our method were demonstrated through comparisons with other ensemble techniques, and the results showed that our method outperformed other methods.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-0-387-69319-4_15
A Hybrid Knowledge Based-Clustering Multi-Class SVM Approach for Genes Expression Analysis
  • Jan 1, 2007
  • Budi Santosa + 2 more

This study utilizes Support Vector Machines (SVM) for multi-class classification of a real data set with more than two classes. The data is a set of E. coli whole-genome gene expression profiles. The problem is how to classify these genes based on their behavior in response to changing pH of the growth medium and mutation of the acid tolerance response gene regulator GadX. In order to apply these techniques, first we have to label the genes. The labels indicate the response of genes to the experimental variables: 1-unchanged, 2-decreased expression level and 3-increased expression level. To label the genes, an unsupervised K-Means clustering technique is applied in a multi-level scheme. Multi-level K-Means clustering is itself an improvement over standard K-Means applications. SVM is used here in two ways. First, labels resulting from multi-level K-Means clustering are confirmed by SVM. To judge the performance of SVM, two other methods, K-nearest neighbor (KNN) and Linear Discriminant Analysis (LDA) are implemented. The Implementation of Multi-class SVM used one-against-one method and one-against-all method. The results show that SVM outperforms KNN and LDA. The advantage of SVM includes the generalization error and the computing time. Second, different from the first application, SVM is used to label the genes after it is trained by a set of training data obtained from K-Means clustering. This alternative SVM strategy offers an improvement over standard SVM applications.

  • Research Article
  • Cite Count Icon 19
  • 10.3233/jifs-191265
Big data analysis with artificial intelligence technology based on machine learning algorithm
  • Jan 1, 2020
  • Journal of Intelligent & Fuzzy Systems
  • Zeliang Zhang

Artificial intelligence technology has been applied very well in big data analysis such as data classification. In this paper, the application of the support vector machine (SVM) method from machine learning in the problem of multi-classification was analyzed. In order to improve the classification performance, an improved one-to-one SVM multi-classification method was creatively designed by combining SVM with the K-nearest neighbor (KNN) method. Then the method was tested using UCI public data set, Statlog statistical data set and actual data. The results showed that the overall classification accuracy of the one-to-many SVM, one-to-one SVM and improved one-to-one SVM were 72.5%, 77.25% and 91.5% respectively in the classification of UCI publication data set and Statlog statistical data set, and the total classification accuracy of the neural network, decision tree, basic one-to-one SVM, directed acyclic graph improved one-to-one SVM and fuzzy decision method improved one-to-one SVM and improved one-to-one SVM proposed in this study was 83.98%, 84.55%, 74.07%, 81.5%, 82.68% and 92.9% respectively in the classification of fault data of transformer, which demonstrated the improved one-to-one SVM had good reliability. This study provides some theoretical bases for the application of methods such as machine learning in big data analysis.

  • Research Article
  • 10.9734/ajrcos/2024/v17i9504
Classification of Lung Cancer Using SVM with Feature Selection Based on PSO-ROC
  • Sep 27, 2024
  • Asian Journal of Research in Computer Science
  • S Sivakumar

The global issue of lung cancer has grown to be very serious. Using machine learning to classify lung cancer is one method. The challenges in this study are how to apply Particle Swarm Optimization rate of change (PSO-ROC) as a feature selection method and support vector machine (SVM) as a classifier in the context of lung cancer classification; how to compare the accuracy values and running times between SVM without first reducing or selecting the features, SVM with PSO feature selection, and SVM with SVM with PSO-ROC feature selection in the context of lung cancer classification. The purpose of this work is to use SVM with feature selection based on the PSO-ROC algorithm to classify lung cancer. Three methods of classification were used in this study: first, Support Vector Machine (SVM) classification without feature reduction or feature selection; second, SVM and PSO feature selection method; and third, SVM and PSO -ROC feature selection. There are two categories for cancer: malignant and non-cancerous. The findings of this study should help the medical community categorize cancer more quickly and accurately, especially lung cancer. The PSO-ROC based feature selection selects limited number of attributes and yields high classification accuracy compare to others.

  • Conference Article
  • Cite Count Icon 1
  • 10.1063/1.5045446
Classification with neural network and SVM via decision tree algorithm
  • Jan 1, 2018
  • John Tsiligaridis

This work provides a method for classification using a Support Vector Machine (SVM) via a Decision Tree algorithm. A probabilistic Decision Tree algorithm focusing on large frequency classes (DTPL) is developed. A method for SVM classification (DT_SVM) using Tabu Search (TS) via DTs is developed. In order to reduce the training complexity of the Support Vector Machine (SVM), the DTPL performs partitions that can be treated as clusters. The TS algorithm can provide the ability to approximate the decision boundary of an SVM. Based on DTs, a SVM algorithm is developed to improve the training time of the SVM considering a subset of the cluster’s instances. A Neural Network (NN) is composed of many neurons that are linked together according to a specific network topology. Main characteristics of SVM and NN are presented. Comparison between NN and SVM with two types of kernels show the superiority of the SVM. Simulation results for all the algorithms with different complexity data sets are provided.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/wcica.2008.4593638
The geometric relationship between Core Vector Machine and Support Vector Machine
  • Jan 1, 2008
  • Liang Chang

Core vector machine (CVM) is an efficient kernel method for large data classification. It has prominent advantages in dealing with large data sets in high-dimensional space. This paper presents a novel geometric framework between CVM and the traditional support vector machine (SVM). We proved theoretically that: (1) In one-class classification, non-training examples on the surface of the exact minimum enclosing ball (MEB) in CVM belong to the optimal separating hyperplane in SVM; (2) In one-class classification, training examples on the surface of the exact MEB in CVM correspond to the support vectors in SVM; (3) In two-class classification, non-training examples on the surface of the exact MEB in CVM belong to the bounding hyperplanes in SVM; (4) In two-class classification, training examples on the surface of the exact MEB in CVM correspond to the support vectors in SVM. Geometric interpretations for points on the (1 + epsiv)-approximate MEB in CVM are presented as well. It is believed that the obtained geometric relationship will be helpful in analyzing CVM and inspiring new classification algorithms.

  • Research Article
  • Cite Count Icon 3
  • 10.1166/jmihi.2020.3042
Classification of Medical Text Data Using Convolutional Neural Network-Support Vector Machine Method
  • Jul 1, 2020
  • Journal of Medical Imaging and Health Informatics
  • Lan Liu + 3 more

Conventional methods of medical text data classification, neglect of context among different words and semantic information, has a poor text description, classification effect and generalization capability and robustness. To tackle the inefficiencies and low precision in the classification of medical text data, in this paper, we presented a new classification method with improved convolutional neural network (CNN) and support vector machine (SVM), i.e., CNN-SVM method. In the method, some convolution kernel filters that contribute greatly to the CNN model are first selected by the average response energy (ARE) value, and then used to simplify and reconstruct the CNN model. Next, the SVM classifier was optimized by firefly algorithm (FA) and context information to overcome the disadvantages of over-saturation and over-training in SVM classification. Finally, the presented CNN-SVM method is tested by the simulation experiment and the true classification of medical text data. The experimental results show that the presented CNN-SVM method in this paper can significantly reduce the complexity and amount of computation compared to the conventional methods, and further promote the computational efficiency and classification accuracy of medical text data.

  • Research Article
  • Cite Count Icon 66
  • 10.1016/j.ins.2022.12.090
Non-parallel bounded support matrix machine and its application in roller bearing fault diagnosis
  • Jan 2, 2023
  • Information Sciences
  • Haiyang Pan + 3 more

Non-parallel bounded support matrix machine and its application in roller bearing fault diagnosis

  • Research Article
  • Cite Count Icon 34
  • 10.1109/jstars.2014.2346475
Single-Species Detection With Airborne Imaging Spectroscopy Data: A Comparison of Support Vector Techniques
  • Jun 1, 2015
  • IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  • Claire A Baldeck + 1 more

Progress in mapping plant species remotely with imaging spectroscopy data is limited by the traditional classification framework, which carries the requirement of exhaustively defining all classes (species) encountered in a landscape. As the research objective may be to map only one or a few species of interest, we need to explore alternative classification methods that may be used to more efficiently detect a single species. We compared the performance of three support vector machine (SVM) methods designed for single-class detection—binary (one-against-all) SVM, one-class SVM, and biased SVM—in detecting five focal tree and shrub species using data collected by the Carnegie Airborne Observatory over an African savanna. Prior to this comparison, we investigated the effects of training data amount and balance on binary SVM and evaluated alternative methods for tuning one-class and biased SVMs. A key finding was that biased SVM was generally best parameterized by crown-level cross validation paired with the tuning criterion proposed by Lee and Liu [1] . Among the different single-class methods, binary SVM showed the best overall performance (average F-scores 0.43–0.78 among species), whereas one-class SVM showed very poor performance (F-scores 0.09–0.46). However, biased SVM produced results similar to those obtained with binary SVM (F-scores 0.40–0.72), despite using labeled training data from only the focal class. Our results indicate that both binary and biased SVMs can work well for remote single-species detection, while both methods, particularly biased SVM, greatly reduce the amount of training data required compared with traditional multispecies classification.

  • Research Article
  • Cite Count Icon 35
  • 10.1016/j.isprsjprs.2013.11.004
An innovative support vector machine based method for contextual image classification
  • Dec 23, 2013
  • ISPRS Journal of Photogrammetry and Remote Sensing
  • Rogério Galante Negri + 2 more

An innovative support vector machine based method for contextual image classification

  • Research Article
  • Cite Count Icon 1
  • 10.14257/ijsh.2016.10.5.12
Study on a Novel Data Classification Method Based on Improved GA and SVM Model
  • May 31, 2016
  • International Journal of Smart Home
  • Jing Huo + 1 more

Support vector machine(SVM) can effectively solve the classification problem with small samples, nonlinear and high dimensions, but it exits the weak generalization ability and low classification accuracy. So an improved genetic algorithm(IGA) is introduced in order to propose a new classification(IGASVM) method based on combining improved GA and SVM model. In the proposed IGASVM method, the self-adaptive control parameter strategy and improving convergence speed strategy are introduced into the GA to keep the diversity of the population, promptly reflect the premature convergence of the individual and escape from the local optimal solution for improving the search performance. Then the improved GA is used to optimize and determine the parameters of the SVM model in order to improve the learning ability and generalization ability of the SVM model for obtaining new classification (IGASVM) method. Finally, the experiment data is selected to test the effectiveness of the proposed IGASVM method. The experiment results show that the improved GA can effectively optimize and determine the parameters of the SVM model, and the IGASVM method takes on the better learning ability, generalization ability and classification accuracy.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.