The research aimed to extract semantic fields from Arabic online news and advance Natural Language Processing (NLP) applications in understanding and managing news information effectively. It provides a comprehensive approach to processing and analyzing large volumes of Arabic news data by integrating semantic field analysis, NLP, and computational linguistics. Using quantitative methods, Arabic news articles were collected and processed with Python, a popular programming language in data analysis, and applied various NLP techniques and machine learning models to accurately extract semantic fields. The primary objective was to evaluate the effectiveness of different classification models in categorizing Arabic news and to identify the most suitable model for semantic field extraction. The research evaluated five classification models: Naive Bayes, Support Vector Machine (SVM), Logistic Regression, Random Forest, and Gradient Boosting. Among these, SVM achieves the highest overall accuracy of 90%. Specifically, SVM demonstrated exceptional performance in categorizing sports-related news, with a 99% probability and an F1-Score of 98%. However, it faced challenges in categorizing health and science news, achieving a lower F1-Score of 79%. Overall, the study demonstrated the effectiveness of computational methods, particularly SVM, in classifying Arabic news and extracting semantic fields, thereby advancing NLP and computational linguistics. The findings highlighted the potential of SVM for accurate news analysis and the need for further enhancement of NLP techniques to address multilingual and domain-specific challenges.
Read full abstract