Enhancing SVM-Based Classification Performance on Indonesian Sentences through TF-IDF and Directional Augmentation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Background: The distinction between standard and non-standard Indonesian sentences is traditionally well-defined, yet the ubiquity of digital communication has increasingly blurred these boundaries. This convergence introduces significant lexical ambiguity in formal contexts, complicating the performance of automated text classification systems. Objective: This study aims to enhance the robustness of Support Vector Machine (SVM) classification by addressing these linguistic irregularities through TF-IDF vectorization and a targeted directional augmentation strategy. Methods: A corpus comprising 5,394 labeled sentences was processed under a strict anti-leak grouping strategy to rigorously prevent semantic leakage between training, validation, and testing sets. To resolve decision boundary overlaps often missed by the baseline model, manual directional augmentation was applied, specifically targeting ambiguous sentence structures to enrich the training distribution and linguistic diversity. Results: The experiments demonstrated that directional augmentation significantly refined the model's decision margins. While the baseline model achieved a test accuracy of 94.39%, the augmented approach substantially improved generalization capabilities across unseen groups, elevating validation accuracy from 96.11% to 97.39% and test accuracy to 96.16%. Conclusion: These findings substantiate that structurally enriching the dataset effectively mitigates overfitting and improves sensitivity. However, given the scalability constraints of manual intervention, future research should prioritize automated augmentation techniques and contextual embeddings to handle deep linguistic nuances further.

Similar Papers
  • Research Article
  • Cite Count Icon 5
  • 10.11591/telkomnika.v12i1.3969
Sensitivity of Support Vector Machine Classification to Various Training Features
  • Jan 1, 2014
  • TELKOMNIKA Indonesian Journal of Electrical Engineering
  • Nanhai Yang + 3 more

Remote sensing image classification is one of the most important techniques in image interpretation, which can be used for environmental monitoring, evaluation and prediction. Many algorithms have been developed for image classification in the literature. Support vector machine (SVM) is a kind of supervised classification that has been widely used recently. The classification accuracy produced by SVM may show variation depending on the choice of training features. In this paper, SVM was used for land cover classification using Quickbird images. Spectral and textural features were extracted for the classification and the results were analyzed thoroughly. Results showed that the number of features employed in SVM was not the more the better. Different features are suitable for different type of land cover extraction. This study verifies the effectiveness and robustness of SVM in the classification of high spatial resolution remote sensing images. DOI : http://dx.doi.org/10.11591/telkomnika.v12i1.3969

  • Research Article
  • Cite Count Icon 54
  • 10.1016/j.jmva.2011.01.009
On qualitative robustness of support vector machines
  • Feb 5, 2011
  • Journal of Multivariate Analysis
  • Robert Hable + 1 more

On qualitative robustness of support vector machines

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/icspcs47537.2019.9008746
On the Robustness of Support Vector Machines against Adversarial Examples
  • Dec 1, 2019
  • Peter Langenberg + 3 more

In this paper, the robustness of Support Vector Machines (SVMs) against adversarial instances is considered in relation to the design parameters. After generating adversarial instances using convex programming, it is shown through extensive numerical analysis that the robustness is significantly affected by parameters which change the linearity of the models. Interestingly, robustness is only slightly sensitive to the parameter determining the margin between classes. It is shown that adversarial robustness not only depends on the geometric properties of the classifier but is also subject to the accuracy of the model. The results are discussed in the light of the so-called linearity hypothesis, regarding adversarial robustness of machine learning algorithms.

  • Research Article
  • Cite Count Icon 4
  • 10.5405/jmbe.1776
Support-vector-machine-based Meditation Experience Evaluation Using Electroencephalography Signals
  • Jan 1, 2014
  • Journal of Medical and Biological Engineering
  • Yu-Hao Lee

Meditation is used to improve psychological well-being, but there is no scientific quantitative evidence to prove the relation between them. Therefore, in this study, an effective classifier, namely a support vector machine (SVM), is applied to classify meditation experiences and help validate the interaction between emotional stability and a meditation experience. Three groups (10 subjects in each), created based on practice experience in meditation (S group with 10-30 years, J group with 1-7 years, and N group with 0 years of experience in Tibetan Nyingmapa meditation), were recruited to receive visual stimuli in the form of affective pictures. The images shown were selected from the International Affective Pictures System (IAPS), a confidential database. The response signals were acquired through physiological examination via electroencephalography (EEG). The subjects' data were entered into two classification systems, namely those based on the classification and regression tree (CART) method and the SVM method, respectively, and the outcomes were compared. From the classification results, SVM had a higher accuracy rate (98%) than that of CART (79%). The stability and robustness of SVM are higher than those of CART, as determined by performing over 100 repetitions and using various variable numbers. An evaluator based on SVM can thus assess a meditation experience through visual emotional stimulation. The results can help explain emotional stability during meditation.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/icmse.2016.8365486
Application of SVM in the estimation of GCV of coal and a comparison study of the accuracy and robustness of SVM
  • Aug 1, 2016
  • Jin-Hui Fu

Gross calorific value (GCV, HHV) is an important property of coal, but its time-consuming mensuration cannot always satisfy the practical demands. This paper investigates the application of statistics models to measure GCV quickly and accurately using coal components with mensuration that has been achieved in real time on-line in China to meet practical demands. Linear regression (LM), nonlinear regression equation (NLM), and artificial neural networks (ANN) have been developed for the estimation of GCV by researchers. In this paper, 1400 data points are used to predict the GCV of China coal. The estimating methodology progress is determined using the support vector machine (SVM), and the estimating robustness is evaluated. The comparison study manifested that the SVM model outperformed the three existing models in terms of accuracy and robustness. Meanwhile, the sampling method is improved, and the input variables are reduced to those that can be measured in real time on-line.

  • Research Article
  • Cite Count Icon 674
  • 10.1016/j.jag.2009.06.002
A kernel functions analysis for support vector machines for land cover classification
  • Jul 3, 2009
  • International Journal of Applied Earth Observation and Geoinformation
  • T Kavzoglu + 1 more

A kernel functions analysis for support vector machines for land cover classification

  • Conference Article
  • Cite Count Icon 11
  • 10.36334/modsim.2011.a5.saberi
Comparing performance and robustness of SVM and ANN for fault diagnosis in a centrifugal pump
  • Dec 12, 2011
  • Morteza Saberi + 2 more

Fault detection and diagnosis has an effective role for the safe operation and long life of systems. Condition monitoring is an appropriate way of the maintenance techniques which is applicable in the fault diagnosis of rotating machinery faults. We considered the Support Vector Machine (SVM) method for classifying the condition of centrifugal pump into two types of faults through six features: flow, temperature, suction pressure, discharge pressure, velocity, and vibration. The SVM method is based on statistical learning theory (SLT) and powerful for the problem with small sampling, nonlinear and high dimension. (L.V. Ganyun et al 2005). The SVM classifying is implemented with 4 kernel functions and the results of them are compared. We use an Artificial Neural Network (ANN) as the second classifying method to have comparison among the performance of two methods. After applying the two methods to our data set we make the data set noisy and again we try our SVMs and ANN to compare their robustness in noisy conditions and the results obtained from two methods confirmed the superiority of SVM with some specific kernel functions.

  • Research Article
  • Cite Count Icon 5
  • 10.47852/bonviewaaes42022418
Comparative Machine Learning Models for Predicting Loan Fructification in a Semi-Urban Area
  • Mar 19, 2024
  • Archives of Advanced Engineering Science
  • Héritier Nsenge Mpia + 2 more

The current research proposes a reliable and robust machine learning (ML) model which outperforms among six other models in predicting loan fructification obtained by entrepreneurs in a semi-urban area. The proposed model predicts if an entrepreneur can make grow a loan from a microfinance firm, a bank, a financial company, or an individual. The proposed model uses primary data collected from entrepreneurs residing in Butembo, a semi-urban town located in eastern Democratic Republic of Congo as dataset. This study uses a dataset that contains 5868 records. Seven ML model performances are compared in the loan fructification prediction: support vector machine (SVM), random forest, extra trees, decision tree, naïve Bayes, k-nearest neighbors, and logistic regression. SVM reveals to be the best model for predicting loan fructification using features such as age, years of working experience of the entrepreneur, entrepreneur loan repayment conviction, used mean by the lender to recover its loan, entrepreneur opinion on the disadvantage of taking out a loan, capacity of the entrepreneur to invest after obtaining loan, entrepreneur position on the possibility of launching a business without a loan, entrepreneur willingness to apply again for loan in the future, and success project after obtaining loan. The study uses accuracy, recall, precision, and F1-score as metrics to assess the developed models. The four metrics for SVM scored 95%, 95%, 83%, and 83%, respectively. The proposed model confirms the robustness of SVM in predicting loan fructification. Received: 3 January 2024 | Revised: 29 January 2024 | Accepted: 12 March 2024 Conflicts of Interest The authors declare that they have no conflicts of interest to this work. Data Availability Statement Data available on request from the corresponding author upon reasonable request. Author Contribution Statement Héritier Nsenge Mpia: Conceptualization, Methodology, Software, Validation, Formal analysis, Resources, Writing - original draft, Writing - review & editing, Visualization, Supervision, Project administration. Laure Mbambu Syasimwa: Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision. Dorcas Masika Muyisa: Validation, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision, Project administration.

  • Research Article
  • Cite Count Icon 38
  • 10.1016/j.bmc.2008.04.028
Support vector machines classification of hERG liabilities based on atom types
  • Apr 16, 2008
  • Bioorganic & Medicinal Chemistry
  • Lei Jia + 1 more

Support vector machines classification of hERG liabilities based on atom types

  • Research Article
  • Cite Count Icon 63
  • 10.1016/j.eswa.2021.115691
Handling the impact of feature uncertainties on SVM: A robust approach based on Sobol sensitivity analysis
  • Oct 19, 2021
  • Expert Systems with Applications
  • Wahb Zouhri + 2 more

Handling the impact of feature uncertainties on SVM: A robust approach based on Sobol sensitivity analysis

  • Research Article
  • Cite Count Icon 59
  • 10.5589/m12-022
Simultaneous feature selection and SVM parameter determination in classification of hyperspectral imagery using Ant Colony Optimization
  • Jan 1, 2012
  • Canadian Journal of Remote Sensing
  • Farhad Samadzadegan + 2 more

Hyperspectral remote sensing imagery, due to its rich source of spectral information, provides an efficient tool for land cover classifications in complex geographical areas. However, the high-dimensional space of this imagery poses two important challenges in the classification process: the Hughes phenomena and the existence of relevant and redundant features. The robustness of Support Vector Machines (SVM) in high-dimensional space makes them an efficient tool for classifying hyperspectral imagery. However, optimum SVM parameter determination and optimum feature selection are the two optimization issues that strongly effect SVM performance. Traditional optimization algorithms can discover optimum solutions in a limited search space with one local optimum. Nevertheless, in high-dimensional space traditional optimization algorithms usually get trapped in a local optimum, therefore it is necessary to apply meta-heuristic optimization algorithms to obtain near-global optimum solutions. This study evaluates the potential of Ant Colony Optimization (ACO) for determining SVM parameters and selecting features. Results obtained from AVIRIS and ROSIS hyperspectral datasets demonstrate the superior performance of SVM, achieved by simultaneously optimizing SVM parameters and subsets of the input feature. For comparison, the evaluation is also performed by applying it to other meta-heuristic optimization algorithms such as simulated annealing, tabu search, and genetic algorithm. The results demonstrate a better performance of the ACO-based algorithm in regards to improving the classification accuracy and decreasing the size of selected feature subsets.

  • Research Article
  • Cite Count Icon 3
  • 10.47065/josyc.v3i4.2072
Recommender System Based on Tweets with Singular Value Decomposition and Support Vector Machine Classification
  • Sep 3, 2022
  • Journal of Computer System and Informatics (JoSYC)
  • Rafi Anandita Wicaksono + 1 more

In modern times, the movie industry is growing rapidly. Netflix is one of the platforms that can be used to watch movies and provides many types of genres and movie titles. With so many genres and movie titles sometimes making it difficult for people to choose a movie to watch, one solution to the problem is a recommendation system that can recommend movies based on user ratings. One method in the recommendation system is collaborative filtering. One of the algorithms contained in collaborative filtering is singular value decomposition. Twitter is one of the places where people often write their opinions about the movies they have watched, from people's tweets on Twitter will be processed into rating value data. In this system, tweets become input that is processed into data that has a rating. This research implements a user-based recommendation system based on ratings from tweets using collaborative filtering combined with the Singular Value Decomposition (SVD) algorithm and Support Vector Machine (SVM) classification and implemented it on user-based and item-based. This research aims to implement a system that combines collaborative filtering techniques with the Singular Value Decomposition (SVD) algorithm and Support Vector Machine (SVM) classification. With the hope of producing a good movie recommendation model and providing accurate predictions for recommended and non-recommended movies. The test results in this study show that Collaborative Filtering gets the best RMSE value of 0.8162 on user-based and 0.5911 on item-based. The combination of Singular Value Decomposition (SVD) algorithm and Support Vector Machine (SVM) classification using hyperparameter tuning resulted in 81% precision and 81% recall for user-based while 80% precision and 80% recall for item-based.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-981-33-4046-6_15
Real Estate Sales Forecasting with SVM Classification
  • Jan 1, 2021
  • Arti Patle + 1 more

The real estate market has a very important role in our society. It has a relationship with development and a person’s fundamental need. So, correct forecasting for sales and demand for real estate is very significant. SVM is a generous type of learning machine which solves classification with limited sample learning, nonlinear classification as well as handle “curse of dimensionality”. SVM has powerful classification capability with the feature selection, kernel selection, and parameter optimization add-on the classification accuracy. This paper focus on real estate sales forecasting and booking scenario on the basis of customer enquiry features. Paper follows the approach of Support Vector Machine (SVM) classification to forecast sales in real state. SVM is a type of machine learning algorithm from this, inference knowledge for prediction of sale. The proposed model helps real estate people to make a decision for the further stage of the construction or launch a new project according to sales and demand. For the classification, data is gathered from the real estate project. SVM classification accuracy is measured with polynomial kernel and feature selection. The optimal solution can be found and forecasting effect can be achieved by SVM classification. The experimental result proves that the SVM has good forecasting capability. Results also identify that how classification in real estate provides the solution for sales forecasting.KeywordsSales forecastingReal estateKernelFeatureSupport vector

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/scs.2003.1227126
On signal detection using support vector machines
  • Jul 10, 2003
  • A Burian + 1 more

The detection type problems represent a special case of nonlinear mapping. This fact makes the use of neural networks attractive for signal detection problems. In order to obtain good generalization excessive tuning is needed. Also, most of the neural network learning theories does not make use of the optimal hyperplane concept. In this paper, we consider optimal hyperplane signal detection with support vector machines (SVMs), for detecting a known signal corrupted by noise. Experimental results illustrate the detection performances in various cases. The practical implementation and the robustness of SVMs are also considered.

  • Research Article
  • Cite Count Icon 119
  • 10.1016/j.jngse.2011.05.002
Fuzzy logic-driven and SVM-driven hybrid computational intelligence models applied to oil and gas reservoir characterization
  • Jul 1, 2011
  • Journal of Natural Gas Science and Engineering
  • Fatai Anifowose + 1 more

Fuzzy logic-driven and SVM-driven hybrid computational intelligence models applied to oil and gas reservoir characterization

Save Icon
Up Arrow
Open/Close