Enhancing SVM-Based Classification Performance on Indonesian Sentences through TF-IDF and Directional Augmentation
Background: The distinction between standard and non-standard Indonesian sentences is traditionally well-defined, yet the ubiquity of digital communication has increasingly blurred these boundaries. This convergence introduces significant lexical ambiguity in formal contexts, complicating the performance of automated text classification systems. Objective: This study aims to enhance the robustness of Support Vector Machine (SVM) classification by addressing these linguistic irregularities through TF-IDF vectorization and a targeted directional augmentation strategy. Methods: A corpus comprising 5,394 labeled sentences was processed under a strict anti-leak grouping strategy to rigorously prevent semantic leakage between training, validation, and testing sets. To resolve decision boundary overlaps often missed by the baseline model, manual directional augmentation was applied, specifically targeting ambiguous sentence structures to enrich the training distribution and linguistic diversity. Results: The experiments demonstrated that directional augmentation significantly refined the model's decision margins. While the baseline model achieved a test accuracy of 94.39%, the augmented approach substantially improved generalization capabilities across unseen groups, elevating validation accuracy from 96.11% to 97.39% and test accuracy to 96.16%. Conclusion: These findings substantiate that structurally enriching the dataset effectively mitigates overfitting and improves sensitivity. However, given the scalability constraints of manual intervention, future research should prioritize automated augmentation techniques and contextual embeddings to handle deep linguistic nuances further.
- Research Article
5
- 10.11591/telkomnika.v12i1.3969
- Jan 1, 2014
- TELKOMNIKA Indonesian Journal of Electrical Engineering
Remote sensing image classification is one of the most important techniques in image interpretation, which can be used for environmental monitoring, evaluation and prediction. Many algorithms have been developed for image classification in the literature. Support vector machine (SVM) is a kind of supervised classification that has been widely used recently. The classification accuracy produced by SVM may show variation depending on the choice of training features. In this paper, SVM was used for land cover classification using Quickbird images. Spectral and textural features were extracted for the classification and the results were analyzed thoroughly. Results showed that the number of features employed in SVM was not the more the better. Different features are suitable for different type of land cover extraction. This study verifies the effectiveness and robustness of SVM in the classification of high spatial resolution remote sensing images. DOI : http://dx.doi.org/10.11591/telkomnika.v12i1.3969
- Research Article
54
- 10.1016/j.jmva.2011.01.009
- Feb 5, 2011
- Journal of Multivariate Analysis
On qualitative robustness of support vector machines
- Conference Article
8
- 10.1109/icspcs47537.2019.9008746
- Dec 1, 2019
In this paper, the robustness of Support Vector Machines (SVMs) against adversarial instances is considered in relation to the design parameters. After generating adversarial instances using convex programming, it is shown through extensive numerical analysis that the robustness is significantly affected by parameters which change the linearity of the models. Interestingly, robustness is only slightly sensitive to the parameter determining the margin between classes. It is shown that adversarial robustness not only depends on the geometric properties of the classifier but is also subject to the accuracy of the model. The results are discussed in the light of the so-called linearity hypothesis, regarding adversarial robustness of machine learning algorithms.
- Research Article
4
- 10.5405/jmbe.1776
- Jan 1, 2014
- Journal of Medical and Biological Engineering
Meditation is used to improve psychological well-being, but there is no scientific quantitative evidence to prove the relation between them. Therefore, in this study, an effective classifier, namely a support vector machine (SVM), is applied to classify meditation experiences and help validate the interaction between emotional stability and a meditation experience. Three groups (10 subjects in each), created based on practice experience in meditation (S group with 10-30 years, J group with 1-7 years, and N group with 0 years of experience in Tibetan Nyingmapa meditation), were recruited to receive visual stimuli in the form of affective pictures. The images shown were selected from the International Affective Pictures System (IAPS), a confidential database. The response signals were acquired through physiological examination via electroencephalography (EEG). The subjects' data were entered into two classification systems, namely those based on the classification and regression tree (CART) method and the SVM method, respectively, and the outcomes were compared. From the classification results, SVM had a higher accuracy rate (98%) than that of CART (79%). The stability and robustness of SVM are higher than those of CART, as determined by performing over 100 repetitions and using various variable numbers. An evaluator based on SVM can thus assess a meditation experience through visual emotional stimulation. The results can help explain emotional stability during meditation.
- Conference Article
4
- 10.1109/icmse.2016.8365486
- Aug 1, 2016
Gross calorific value (GCV, HHV) is an important property of coal, but its time-consuming mensuration cannot always satisfy the practical demands. This paper investigates the application of statistics models to measure GCV quickly and accurately using coal components with mensuration that has been achieved in real time on-line in China to meet practical demands. Linear regression (LM), nonlinear regression equation (NLM), and artificial neural networks (ANN) have been developed for the estimation of GCV by researchers. In this paper, 1400 data points are used to predict the GCV of China coal. The estimating methodology progress is determined using the support vector machine (SVM), and the estimating robustness is evaluated. The comparison study manifested that the SVM model outperformed the three existing models in terms of accuracy and robustness. Meanwhile, the sampling method is improved, and the input variables are reduced to those that can be measured in real time on-line.
- Research Article
674
- 10.1016/j.jag.2009.06.002
- Jul 3, 2009
- International Journal of Applied Earth Observation and Geoinformation
A kernel functions analysis for support vector machines for land cover classification
- Conference Article
11
- 10.36334/modsim.2011.a5.saberi
- Dec 12, 2011
Fault detection and diagnosis has an effective role for the safe operation and long life of systems. Condition monitoring is an appropriate way of the maintenance techniques which is applicable in the fault diagnosis of rotating machinery faults. We considered the Support Vector Machine (SVM) method for classifying the condition of centrifugal pump into two types of faults through six features: flow, temperature, suction pressure, discharge pressure, velocity, and vibration. The SVM method is based on statistical learning theory (SLT) and powerful for the problem with small sampling, nonlinear and high dimension. (L.V. Ganyun et al 2005). The SVM classifying is implemented with 4 kernel functions and the results of them are compared. We use an Artificial Neural Network (ANN) as the second classifying method to have comparison among the performance of two methods. After applying the two methods to our data set we make the data set noisy and again we try our SVMs and ANN to compare their robustness in noisy conditions and the results obtained from two methods confirmed the superiority of SVM with some specific kernel functions.
- Research Article
5
- 10.47852/bonviewaaes42022418
- Mar 19, 2024
- Archives of Advanced Engineering Science
The current research proposes a reliable and robust machine learning (ML) model which outperforms among six other models in predicting loan fructification obtained by entrepreneurs in a semi-urban area. The proposed model predicts if an entrepreneur can make grow a loan from a microfinance firm, a bank, a financial company, or an individual. The proposed model uses primary data collected from entrepreneurs residing in Butembo, a semi-urban town located in eastern Democratic Republic of Congo as dataset. This study uses a dataset that contains 5868 records. Seven ML model performances are compared in the loan fructification prediction: support vector machine (SVM), random forest, extra trees, decision tree, naïve Bayes, k-nearest neighbors, and logistic regression. SVM reveals to be the best model for predicting loan fructification using features such as age, years of working experience of the entrepreneur, entrepreneur loan repayment conviction, used mean by the lender to recover its loan, entrepreneur opinion on the disadvantage of taking out a loan, capacity of the entrepreneur to invest after obtaining loan, entrepreneur position on the possibility of launching a business without a loan, entrepreneur willingness to apply again for loan in the future, and success project after obtaining loan. The study uses accuracy, recall, precision, and F1-score as metrics to assess the developed models. The four metrics for SVM scored 95%, 95%, 83%, and 83%, respectively. The proposed model confirms the robustness of SVM in predicting loan fructification. Received: 3 January 2024 | Revised: 29 January 2024 | Accepted: 12 March 2024 Conflicts of Interest The authors declare that they have no conflicts of interest to this work. Data Availability Statement Data available on request from the corresponding author upon reasonable request. Author Contribution Statement Héritier Nsenge Mpia: Conceptualization, Methodology, Software, Validation, Formal analysis, Resources, Writing - original draft, Writing - review & editing, Visualization, Supervision, Project administration. Laure Mbambu Syasimwa: Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision. Dorcas Masika Muyisa: Validation, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision, Project administration.
- Research Article
38
- 10.1016/j.bmc.2008.04.028
- Apr 16, 2008
- Bioorganic & Medicinal Chemistry
Support vector machines classification of hERG liabilities based on atom types
- Research Article
63
- 10.1016/j.eswa.2021.115691
- Oct 19, 2021
- Expert Systems with Applications
Handling the impact of feature uncertainties on SVM: A robust approach based on Sobol sensitivity analysis
- Research Article
59
- 10.5589/m12-022
- Jan 1, 2012
- Canadian Journal of Remote Sensing
Hyperspectral remote sensing imagery, due to its rich source of spectral information, provides an efficient tool for land cover classifications in complex geographical areas. However, the high-dimensional space of this imagery poses two important challenges in the classification process: the Hughes phenomena and the existence of relevant and redundant features. The robustness of Support Vector Machines (SVM) in high-dimensional space makes them an efficient tool for classifying hyperspectral imagery. However, optimum SVM parameter determination and optimum feature selection are the two optimization issues that strongly effect SVM performance. Traditional optimization algorithms can discover optimum solutions in a limited search space with one local optimum. Nevertheless, in high-dimensional space traditional optimization algorithms usually get trapped in a local optimum, therefore it is necessary to apply meta-heuristic optimization algorithms to obtain near-global optimum solutions. This study evaluates the potential of Ant Colony Optimization (ACO) for determining SVM parameters and selecting features. Results obtained from AVIRIS and ROSIS hyperspectral datasets demonstrate the superior performance of SVM, achieved by simultaneously optimizing SVM parameters and subsets of the input feature. For comparison, the evaluation is also performed by applying it to other meta-heuristic optimization algorithms such as simulated annealing, tabu search, and genetic algorithm. The results demonstrate a better performance of the ACO-based algorithm in regards to improving the classification accuracy and decreasing the size of selected feature subsets.
- Research Article
3
- 10.47065/josyc.v3i4.2072
- Sep 3, 2022
- Journal of Computer System and Informatics (JoSYC)
In modern times, the movie industry is growing rapidly. Netflix is one of the platforms that can be used to watch movies and provides many types of genres and movie titles. With so many genres and movie titles sometimes making it difficult for people to choose a movie to watch, one solution to the problem is a recommendation system that can recommend movies based on user ratings. One method in the recommendation system is collaborative filtering. One of the algorithms contained in collaborative filtering is singular value decomposition. Twitter is one of the places where people often write their opinions about the movies they have watched, from people's tweets on Twitter will be processed into rating value data. In this system, tweets become input that is processed into data that has a rating. This research implements a user-based recommendation system based on ratings from tweets using collaborative filtering combined with the Singular Value Decomposition (SVD) algorithm and Support Vector Machine (SVM) classification and implemented it on user-based and item-based. This research aims to implement a system that combines collaborative filtering techniques with the Singular Value Decomposition (SVD) algorithm and Support Vector Machine (SVM) classification. With the hope of producing a good movie recommendation model and providing accurate predictions for recommended and non-recommended movies. The test results in this study show that Collaborative Filtering gets the best RMSE value of 0.8162 on user-based and 0.5911 on item-based. The combination of Singular Value Decomposition (SVD) algorithm and Support Vector Machine (SVM) classification using hyperparameter tuning resulted in 81% precision and 81% recall for user-based while 80% precision and 80% recall for item-based.
- Book Chapter
1
- 10.1007/978-981-33-4046-6_15
- Jan 1, 2021
The real estate market has a very important role in our society. It has a relationship with development and a person’s fundamental need. So, correct forecasting for sales and demand for real estate is very significant. SVM is a generous type of learning machine which solves classification with limited sample learning, nonlinear classification as well as handle “curse of dimensionality”. SVM has powerful classification capability with the feature selection, kernel selection, and parameter optimization add-on the classification accuracy. This paper focus on real estate sales forecasting and booking scenario on the basis of customer enquiry features. Paper follows the approach of Support Vector Machine (SVM) classification to forecast sales in real state. SVM is a type of machine learning algorithm from this, inference knowledge for prediction of sale. The proposed model helps real estate people to make a decision for the further stage of the construction or launch a new project according to sales and demand. For the classification, data is gathered from the real estate project. SVM classification accuracy is measured with polynomial kernel and feature selection. The optimal solution can be found and forecasting effect can be achieved by SVM classification. The experimental result proves that the SVM has good forecasting capability. Results also identify that how classification in real estate provides the solution for sales forecasting.KeywordsSales forecastingReal estateKernelFeatureSupport vector
- Conference Article
4
- 10.1109/scs.2003.1227126
- Jul 10, 2003
The detection type problems represent a special case of nonlinear mapping. This fact makes the use of neural networks attractive for signal detection problems. In order to obtain good generalization excessive tuning is needed. Also, most of the neural network learning theories does not make use of the optimal hyperplane concept. In this paper, we consider optimal hyperplane signal detection with support vector machines (SVMs), for detecting a known signal corrupted by noise. Experimental results illustrate the detection performances in various cases. The practical implementation and the robustness of SVMs are also considered.
- Research Article
119
- 10.1016/j.jngse.2011.05.002
- Jul 1, 2011
- Journal of Natural Gas Science and Engineering
Fuzzy logic-driven and SVM-driven hybrid computational intelligence models applied to oil and gas reservoir characterization