- Research Article
- 10.1142/s2196888825500095
- Jun 18, 2025
- Vietnam Journal of Computer Science
- Nghi Hoang Khoa + 5 more
Malware threatens cybersecurity by enabling data theft, unauthorized access, and extortion. Traditional malware detection systems (MDS) struggle with the increasing volume and complexity of malware. While machine learning (ML) and deep learning (DL) offer promising solutions, they remain vulnerable to adversarial attacks that evade detection. Recent research focuses on developing adversarial datasets to retrain ML/DL-based malware detection systems, enhancing their robustness against adversarial attacks. While these methods improve detection of adversarial samples, they also cause more misclassification of non-adversarial data due to overfitting. These methods lack scalability when ML/DL-based MDS are retrained in isolation, without utilizing knowledge from other MDS with retrained models, leading to inefficiency and waste. To tackle these issues, we introduce ProDef-MDS, a proactive defense system that integrates an Adversarial Restoration (AR) module to mitigate adversarial perturbations and recover inputs to a correctly classifiable form before passing them into the malware classification model. We focus on portable executable (PE) malware within Windows OS to evaluate our approach’s effectiveness across various scenarios, including those with adversarial data generated from five white-box attacks, including Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD)10, PGD100, DeepFool, Carlini and Wagner (CW) and one black-box attack Auxiliary Classifier Generative Adversarial Networks (ACGAN). Additionally, we assess our approach with non-adversarial data to demonstrate its effectiveness in adversarial detection without compromising non-adversarial performance. The results obtained from the real-world dataset indicate enhanced robustness and minimal overhead, offering a proactive solution to adversarial threats in MDS. This approach outperforms retraining defense method in five white-box attacks and also indicates better performance in non-adversarial scenarios.
- Research Article
- 10.1142/s2196888825500125
- May 30, 2025
- Vietnam Journal of Computer Science
- Tian-Bo Deng
Tunable digital filters are required in a broad scope of information technology (IT) areas where a digital filter needs to refresh the frequency response in real time during signal processing operations. When it is necessary to refresh filter’s frequency response, the filter user can obtain the required frequency response on the fly by simply updating the filter coefficients. However, when a recursive filter is in operation, updating the filter coefficients risks filter’s stability. This is because the recursive tunable filter may lose stability when the coefficients are suddenly changed. To resolve this instability issue, this paper reveals a pair of parametrized unity-bounded (UB) functions which are needed for guaranteeing the stability. By utilizing the parametrized UB functions to express filter coefficients as the UB functions involving other new variables, those new variables are allowed to hold unfettered values without hurting the stability. This ensures that using the UB functions to express the original coefficients produces a definitely stable filter. This paper aims to reveal two parametrized UB functions featuring abundant varieties, and then investigates as well as compares their impacts on design performance. To clarify the process that exploits the parametrized UB functions to produce a tunable filter, this paper exemplifies the design of a band-pass filter having a tunable passband center frequency (PCF). Design simulations clearly confirm the ensured stability alongside the performance comparisons of the two parametrized UB functions.
- Research Article
1
- 10.1142/s2196888825500083
- May 7, 2025
- Vietnam Journal of Computer Science
- Dharmaraj R Patil
In the field of natural language processing (NLP), sentiment analysis is very important for comprehending user views and opinions. By using the Kaggle ChatGPT sentiment analysis dataset, this work investigates the use of sophisticated ensemble machine learning algorithms to enhance ChatGPT’s sentiment classification performance. Due to its skewed distribution, the dataset poses particular difficulties in reaching high performance metrics. For sentiment classification, we evaluate the performance of many boosting techniques such as AdaBoost, Gradient Boosting Machine (GBM), XGBoost, LightGBM (LGBM), and CatBoost. Moreover, ensemble techniques such as majority voting and stacking are used to improve classification results. While majority voting combines predictions for robust classification, stacking specifically uses many base learners in conjunction with a meta-model to maximize performance. With a classification accuracy of 88.57%, precision of 88.67%, recall of 88.57%, and an F-measure of 88.57%, our tests show that stacking performs better than any other technique. Its efficient mistake mitigation is further demonstrated by its False Positive Rate (FPR) of 0.09 and False Negative Rate (FNR) of 0.13. These findings highlight how ensemble methods — in particular, stacking — can be used to handle skewed data and enhance sentiment classification capabilities. In light of ChatGPT and related conversational artificial intelligence (AI) technologies, the results offer important new information for the creation of trustworthy sentiment analysis systems.
- Research Article
- 10.1142/s2196888825500071
- Mar 29, 2025
- Vietnam Journal of Computer Science
- Thi Kim Thoa Ho + 2 more
In this paper, we propose a novel approach for link prediction on bibliographic networks using pretopology theory and text mining. Supervised learning method is used for link prediction, in which the highlights are to extract the new features from analyzing the network’s structure using pretopology theory and discovering textual content by combining text mining methods. The pretopological features are built by using pseudo-closure distance based on the pseudo-closure function in binary relation space and valued relation space. This approach can capture the complicated neighborhood set of a set in complex networks that cannot be solved by graph theory. Additionally, we analyze content features on a larger text corpus compared with preceding research incorporating text mining methods. Finally, machine learning and deep learning are employed with the consolidation of pretopological features and content features to identify link formation. Our proposed model was conducted on real-world datasets of the bibliographic network and demonstrated superior efficiency compared to existing methods.
- Research Article
2
- 10.1142/s219688882550006x
- Mar 28, 2025
- Vietnam Journal of Computer Science
- Sahbi Bahroun
COVID-19 is a disease that infects people and quickly isolates the entire world. The new variants of COVID-19 continue to cause high mortality rates. Therefore, many scientists worldwide still are looking for a solution to quickly and accurately detect COVID-19. This paper aims to detect COVID-19 using chest CT-Scan and Chest X-ray images. In this work, we design a new bimodal convolutional neural network (CNN) that requires two inputs. The first modality is a chest CT-Scan image segmented by a U-Net deep learning technique to detect infected areas in the lung. The second input is a Chest X-ray image. The proposed CNN combines the features extracted from these two images. Feature extraction is performed on these two input images in two parallel feature extraction layers. The extracted feature vectors will be combined by a perceptron attention mechanism and taken as input to fully connected layers to classify the patient as COVID-19, non-COVID, and pneumonia. The results have shown that the newly designed CNN outperforms other similar state-of-the-art methods especially in distinguishing between pneumonia and COVID-19 cases. The proposed CNN has achieved 98.79% classification accuracy and 43.20% loss. The proposed framework could be particularly beneficial in telemedicine, enabling remote diagnosis in areas with limited access to medical specialists.
- Research Article
3
- 10.1142/s2196888825500058
- Mar 27, 2025
- Vietnam Journal of Computer Science
- Mohammed Chachan Younis + 2 more
Due to the intermittent solar presence, forecasting Solar radiation, is crucial to balancing energy generation and demand, which is critical for the whole grid system. Deep neural networks have become the standard de facto in many fields, and also in forecasting. Despite that, these models are black-box, meaning that it is difficult to understand, given an input, why the corresponding output is produced. This aspect is crucial when a neural network is applied to a real-world scenario such as solar radiation forecasting. For this reason in this paper, we use an explainable model to forecast values on two closed datasets consisting of weather parameters in Basel–Switzerland. Both datasets contain measured values, one from January 2012 to March 2018 and the other from January to December 2020. Evaluating the results against existing models in the literature shows the superiority of the employed model in predicting solar radiation. It is anticipated that the applied model, which offers excellent performance and explainability, would resolve the black-box nature of neural network models in predicting and forecasting solar radiation.
- Research Article
3
- 10.1142/s2196888825300017
- Mar 18, 2025
- Vietnam Journal of Computer Science
- Trien Phat Tran + 4 more
Deep learning has emerged as a transformative approach in medicinal plant identification, addressing the critical need for accurate and scalable solutions to support biodiversity conservation, traditional medicine, and sustainable healthcare practices. This systematic literature review examines 30 papers on deep learning for medicinal plant identification, revealing diverse approaches across global contexts. Convolutional neural networks emerge as the primary technique, achieving high accuracy, particularly with leaf-based identification. Data collection methods vary, with manual fieldwork predominating. The review highlights challenges in scaling to larger species sets and using crowdsourced data, though strategies like data augmentation show promise. Plant state and maturity impact model performance, warranting further investigation. The geographical distribution of studies emphasizes the global relevance of this research, with India and China contributing the most. Mobile applications offer potential for deployment and data collection but lack robust user feedback mechanisms for model refinement. The review identifies gaps in continuous model updating and suggests exploring incremental and zero-shot learning. Overall, the field shows promise but requires more balanced datasets and context-aware approaches to maximize real-world impact in medicinal plant identification.
- Research Article
1
- 10.1142/s2196888825500010
- Mar 4, 2025
- Vietnam Journal of Computer Science
- Pooja Kulkarni + 1 more
The Sanskrit language holds significant importance in Indian culture because it has been extensively used in religious literature, primarily in Hinduism. Numerous ancient Hindu texts originally composed in Sanskrit have since been translated into various Indian and non-Indian languages by Indian and foreign authors. These translations offer a renewed cultural perspective and broaden the reach of Indian literature to a global audience. However, the manual translations of these religious texts often lack thorough validation. Recent advancements in semantic and sentiment analysis, powered by deep learning, have provided enhanced tools for understanding language and text. In this paper, we present a framework that uses semantic and sentiment analysis to validate the English translation of the Ramayana against its original Sanskrit version. The “Ramayana” which narrates the journey of the Rama, the king of Ayodhya, is an ancient Hindu epic written by the sage Valmiki. It is known for its contribution to human values for centuries and has universal relevance. Given the importance of Sanskrit in Indian culture and its influence on literature, understanding the translations of key texts like the Ramayana is essential. Multilingual Bidirectional Encoder Representations from Transformers (mBERT) model is utilized to analyze the selected chapters of the English and the Sanskrit versions of Ramayana. Our analysis reveals that sentiment and semantic alignment between the original Sanskrit and English translations remain consistent despite stylistic and vocabulary differences. The study also compares the findings of Bidirectional Encoder Representations from Transformers (BERT) with its other variants to examine which BERT variant is more suitable for validating Sanskrit text. The paper demonstrates the potential of deep learning techniques for cross-lingual validation of ancient texts.
- Research Article
- 10.1142/s2196888825500046
- Feb 28, 2025
- Vietnam Journal of Computer Science
- Li-Hua Li + 1 more
Early recognition of plant diseases is crucial, and one practical approach is using deep learning models. Models, such as MLP-Mixer and gMLP, based on multi-layer perceptron, offer a compelling option due to their simple architecture. This study aimed to assess the performance of these models in classifying crop leaf diseases. Our findings reveal that both MLP-Mixer and gMLP models, with their similar architectures, exhibit promising performance. We conducted tests using public potato and wheat datasets to evaluate their classification performance. Furthermore, we incorporated gradient centralization during training to enhance the models’ generalization performance. The results indicate that both MLP-Mixer and gMLP achieved classification performance above 0.9100 for both datasets. Specifically, MLP-Mixer achieved 0.9819, and gMLP achieved 0.9873 for potato leaf diseases, while for wheat leaf diseases, MLP-Mixer achieved 0.9121, and gMLP achieved 0.9189. These outcomes emphasize the potential of these models in classifying and identifying diseases in crop leaves.
- Research Article
- 10.1142/s2196888825500034
- Feb 18, 2025
- Vietnam Journal of Computer Science
- Jan Kozak + 11 more
Watching sports matches is a beloved pastime for every fan, and predicting the results and betting on which team will win adds to the excitement. Unfortunately, for some users, a lack of self-control can turn initially innocent play into a long-term problem. For this reason, problem gambling is among the crucial issues facing modern societies. Bookmaker companies, therefore, put considerable effort into implementing responsible gambling practices; however, identifying at-risk individuals remains a major challenge. Our paper focuses on identifying users who may currently have, or soon develop, potential issues with gambling. To achieve this, we introduce a set of machine learning methods combined with preprocessing tools that allow us to initially acquire and anonymize user data. This anonymized database is then used to identify high-risk groups. We tested our approach on a large dataset that was obtained and preprocessed specifically for this study. The experiments were conducted on actual data and verified by specialists responsible for identifying gambling problems. Using our method, we successfully identified and detected the early signs of potential gambling problems in multiple users.