Impact of Genre on Authorship Attribution in Arabic Poetry and Prose

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This article examines whether authors change their styles when writing in different genres, like poetry and prose, particularly under the metrical constraints of the former. Using a corpus of works by five modern Egyptian authors, we analyse the effectiveness of N most frequent words and N most frequent morphological segments in distinguishing between poetry and prose written by the same author. Through supervised and unsupervised machine learning techniques, we demonstrate that authors exhibit distinct stylistic fingerprints in each genre, influenced by the unique conventions and constraints of each form. Results show that each author uses two different stylistic prints when writing prose and poetry. However, when mixing poetry and prose together, all genre-related texts cluster separately, potentially causing author obfuscation. Findings also show that the frequent word method is sufficient for accurately attributing authorship when it comes to mixed-genre texts. In short, the tested linguistic standard features prove resilient across different genres and even survive the constraints of formal poetic meter.

Similar Papers
  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.cose.2024.104190
Assessing the detection of lateral movement through unsupervised learning techniques
  • Nov 6, 2024
  • Computers & Security
  • Christos Smiliotopoulos + 3 more

Assessing the detection of lateral movement through unsupervised learning techniques

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.29333/iejme/12588
Unsupervised machine learning to classify language dimensions to constitute the linguistic complexity of mathematical word problems
  • Jan 1, 2023
  • International Electronic Journal of Mathematics Education
  • David Bednorz + 1 more

The study examines language dimensions of mathematical word problems and the classification of mathematical word problems according to these dimensions with unsupervised machine learning (ML) techniques. Previous research suggests that the language dimensions are important for mathematical word problems because it has an influence on the linguistic complexity of word problems. Depending on the linguistic complexity students can have language obstacles to solve mathematical word problems. A lot of research in mathematics education research focus on the analysis on the linguistic complexity based on theoretical build language dimensions. To date, however it has been unclear what empirical relationship between the linguistic features exist for mathematical word problems. To address this issue, we used unsupervised ML techniques to reveal latent linguistic structures of 17 linguistic features for 342 mathematical word problems and classify them. The models showed that three- and five-dimensional linguistic structures have the highest explanatory power. Additionally, the authors consider a four-dimensional solution. Mathematical word problem from the three-dimensional solution can be classify in two groups, three- and five-dimensional solutions in three groups. The findings revealed latent linguistic structures and groups that could have an implication of the linguistic complexity of mathematical word problems and differ from language dimensions, which are considered theoretically. Therefore, the results indicate for new design principles for interventions and materials for language education in mathematics learning and teaching.

  • Research Article
  • Cite Count Icon 4
  • 10.12688/openreseurope.18593.1
Anomaly Detection in Industrial Processes: Supervised vs. Unsupervised Learning and the Role of Explainability
  • Jan 14, 2025
  • Open Research Europe
  • Avraam Bardos + 9 more

Background Anomaly detection is vital in industrial settings for identifying abnormal behaviors that suggest faults or malfunctions. Artificial intelligence (AI) offers significant potential to assist humans in addressing these challenges. Methods This study compares the performance of supervised and unsupervised machine learning (ML) techniques for anomaly detection. Additionally, model-specific explainability methods were employed to interpret the outputs. A novel explainability approach, MLW-XAttentIon, based on causal reasoning in attention networks, was proposed to visualize the inference process of transformer models. Results Experimental results revealed that unsupervised models perform well without requiring labeled data, offering significant promise. In contrast, supervised models demonstrated greater robustness and reliability. Conclusions Unsupervised ML techniques present a feasible, resource-efficient option for anomaly detection, while supervised methods remain more reliable for critical applications. The MLW-XAttentIon approach enhances interpretability of transformer-based models, contributing to trust and transparency in AI-driven anomaly detection systems.

  • Research Article
  • Cite Count Icon 7
  • 10.54660/ijmor.2023.2.3.70-86
Comparative Analysis of Supervised and Unsupervised Machine Learning for Predictive Analytics
  • Jan 1, 2023
  • International Journal of Management and Organizational Research
  • Ehimah Obuse + 6 more

Predictive analytics has become a crucial tool in data-driven decision-making across industries, leveraging machine learning techniques to extract meaningful patterns from vast datasets. Supervised and unsupervised learning are two primary machine learning approaches widely used for predictive modeling. This study presents a comparative analysis of supervised and unsupervised machine learning techniques, evaluating their effectiveness, applications, and limitations in predictive analytics. Supervised learning algorithms, including decision trees, support vector machines (SVM), random forests, and neural networks, require labeled data to train models for accurate predictions. These algorithms excel in applications such as fraud detection, medical diagnosis, and sales forecasting. In contrast, unsupervised learning techniques like clustering (K-means, DBSCAN) and dimensionality reduction (Principal Component Analysis, Autoencoders) do not rely on labeled data but uncover hidden structures and anomalies in datasets, making them ideal for market segmentation, anomaly detection, and recommendation systems. This study assesses both learning paradigms based on key performance criteria, including accuracy, interpretability, computational efficiency, scalability, and real-world applicability. Findings indicate that supervised learning achieves higher predictive accuracy due to explicit guidance from labeled data but often requires extensive data preprocessing and domain knowledge. Conversely, unsupervised learning provides insights from unstructured data, uncovering hidden relationships, yet lacks definitive accuracy due to the absence of ground truth labels. The selection of the appropriate approach depends on the nature of the dataset, problem complexity, and desired outcome. The study concludes that combining both supervised and unsupervised learning in hybrid models enhances predictive performance by leveraging labeled data for accuracy while uncovering deeper insights from unstructured information. Future research should explore AI-driven automation in predictive analytics and the integration of deep learning techniques for improved scalability and real-time applications.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 130
  • 10.3390/sym12010088
Machine Learning Algorithms for Smart Data Analysis in Internet of Things Environment: Taxonomies and Research Trends
  • Jan 2, 2020
  • Symmetry
  • Mohammed H Alsharif + 3 more

Machine learning techniques will contribution towards making Internet of Things (IoT) symmetric applications among the most significant sources of new data in the future. In this context, network systems are endowed with the capacity to access varieties of experimental symmetric data across a plethora of network devices, study the data information, obtain knowledge, and make informed decisions based on the dataset at its disposal. This study is limited to supervised and unsupervised machine learning (ML) techniques, regarded as the bedrock of the IoT smart data analysis. This study includes reviews and discussions of substantial issues related to supervised and unsupervised machine learning techniques, highlighting the advantages and limitations of each algorithm, and discusses the research trends and recommendations for further study.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 64
  • 10.1109/access.2020.2974933
Machine Learning Ranks ECG as an Optimal Wearable Biosignal for Assessing Driving Stress
  • Jan 1, 2020
  • IEEE Access
  • Mohamed Elgendi + 1 more

The demand for wearable devices that can detect anxiety and stress when driving is increasing. Recent studies have attempted to use multiple biosignals to detect driving stress. However, collecting multiple biosignals can be complex and is associated with numerous challenges. Determining the optimal biosignal for assessing driving stress can save lives. To the best of our knowledge, no study has investigated both longitudinal and transitional stress assessment using supervised and unsupervised ML techniques. Thus, this study hypothesizes that the optimal signal for assessing driving stress will consistently detect stress using supervised and unsupervised machine learning (ML) techniques. Two different approaches were used to assess driving stress: longitudinal (a combined repeated measurement of the same biosignals over three driving states) and transitional (switching from state to state such as city to highway driving). The longitudinal analysis did not involve a feature extraction phase while the transitional analysis involved a feature extraction phase. The longitudinal analysis consists of a novel interaction ensemble (INTENSE) that aggregates three unsupervised ML approaches: interaction principal component analysis, connectivity-based clustering, and K-means clustering. INTENSE was developed to uncover new knowledge by revealing the strongest correlation between the biosignal and driving stress marker. These three MLs each have their well-known and distinctive geometrical basis. Thus, the aggregation of their result would provide a more robust examination of the simultaneous non-causal associations between six biosignals: electrocardiogram (ECG), electromyogram, hand galvanic skin resistance, foot galvanic skin resistance, heart rate, respiration, and the driving stress marker. INTENSE indicates that ECG is highly correlated with the driving stress marker. The supervised ML algorithms confirmed that ECG is the most informative biosignal for detecting driving stress, with an overall accuracy of 75.02%.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 42
  • 10.1007/s40194-024-01836-z
Monitoring the gas metal arc additive manufacturing process using unsupervised machine learning
  • Sep 23, 2024
  • Welding in the World
  • Giulio Mattera + 2 more

The study aimed to assess the performance of several unsupervised machine learning (ML) techniques in online anomaly (The term “anomaly” is used here to indicate a departure from expected process behavior which may indicate a quality issue which requires further investigation. The term “defect detection” has often been used previously but the specific imperfection is often indirectly inferred.) detection during surface tension transfer (STT)-based wire arc additive manufacturing. Recent advancements in quality monitoring for wire arc manufacturing were reviewed, followed by a comparison of unsupervised ML techniques using welding current and welding voltage data collected during a defect-free deposition process. Both time domain and frequency domain feature extraction techniques were applied and compared. Three analysis methodologies were adopted: ML algorithms such as isolation forest, local outlier factor, and one-class support vector machine. The results highlight that incorporating frequency analysis, such as fast Fourier transform (FFT) and discrete wavelet transform (DWT), for feature extraction based on general frequency response and defined bandwidth frequency response, significantly improves performance, reflected in a 14% increase in F2 score, compared with time-domain features extraction. Additionally, a deep learning approach employing a convolutional autoencoder (CAE) demonstrated superior performance by processing time-frequency domain data stored as spectrograms obtained through short-time Fourier transform (STFT) analysis. The CAE method outperformed frequency domain analysis and traditional ML approaches, achieving an additional 5% improvement in F2-score. Notably, the F2-score (The F2 score is the weighted harmonic mean of the precision and recall (given a threshold value). Unlike the F1 score, which gives equal weight to precision and recall, the F2 score gives more weight to recall than to precision.) increased significantly from 0.78 in time domain analysis to 0.895 in time-frequency analysis. The study emphasizes the potential of utilizing low-cost sensors to develop anomaly detection modules with enhanced accuracy. These findings underscore the importance of incorporating advanced data processing techniques in wire arc additive manufacturing for improved quality control and process optimization.

  • Conference Article
  • Cite Count Icon 31
  • 10.1109/iccike47802.2019.9004325
Anomaly Detection on Shuttle data using Unsupervised Learning Techniques
  • Dec 1, 2019
  • S Shriram + 1 more

Many modern day applications require the ability to identify those observations or data that deviate from the ones that are considered to be normal by domain expert. Anomaly detection helps to identify these anomalies and once identified, then the system can take the necessary changes. In data mining, this problem is tackled using supervised and unsupervised machine learning techniques. Since in many practical applications, data used will have no labels, unsupervised learning techniques are well suited. This work was aimed at comparing various unsupervised anomaly detection techniques using performance metrics like precision, recall, F-score and area under the curve. The unsupervised learning techniques used in this work are One Class Support Vector Machine(OneClassSVM), Local Outlier Factor(LOF), Isolation Forest(IF) and Elliptic Envelope(EE). Shuttle and satellite datasets were used for experimentation. Performance of these unsupervised learning techniques were compared with supervised learning techniques such as SVM and k-NN. Results show that unsupervised learning techniques are on par or better for anomaly detection compared to supervised learning techniques for the shuttle and satellite datasets.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 103
  • 10.3390/systems10050130
Detecting Anomalies in Financial Data Using Machine Learning Algorithms
  • Aug 25, 2022
  • Systems
  • Alexander Bakumenko + 1 more

Bookkeeping data free of fraud and errors are a cornerstone of legitimate business operations. The highly complex and laborious work of financial auditors calls for finding new solutions and algorithms to ensure the correctness of financial statements. Both supervised and unsupervised machine learning (ML) techniques nowadays are being successfully applied to detect fraud and anomalies in data. In accounting, it is a long-established problem to detect financial misstatements deemed anomalous in general ledger (GL) data. Currently, widely used techniques such as random sampling and manual assessment of bookkeeping rules become challenging and unreliable due to increasing data volumes and unknown fraudulent patterns. To address the sampling risk and financial audit inefficiency, we applied seven supervised ML techniques inclusive of deep learning and two unsupervised ML techniques such as isolation forest and autoencoders. We trained and evaluated our models on a real-life GL dataset and used data vectorization to resolve journal entry size variability. The evaluation results showed that the best trained supervised and unsupervised models have high potential in detecting predefined anomaly types as well as in efficiently sampling data to discern higher-risk journal entries. Based on our findings, we discussed possible practical implications of the resulting solutions in the accounting and auditing contexts.

  • Conference Article
  • Cite Count Icon 2
  • 10.56952/arma-2024-0979
Unsupervised Machine Learning for Delineating Stratigraphy in Subsurface Reservoirs for the Utah FORGE Geothermal Project Ayyaz Mustafa
  • Jun 23, 2024
  • Ayyaz Mustafa + 3 more

ABSTRACT: The correlation of rock mechanical properties from one well to another across an area of interest poses a classical and ongoing problem in rock mechanics. This work illustrates identification of the mechanical layers/zones in a geothermal reservoir using unsupervised machine learning (ML) techniques. Mechanical stratigraphy was defined using well logs obtained from three wells located at the Utah FORGE geothermal site: 58-32, 16A(78)-32 and 16B(78)-32. The widely accepted unsupervised ML techniques including K-means clustering, Gaussian mixture models, and DBSCAN (density-based spatial clustering of applications with noise) were utilized to generate the rock classes based on similarities/differences in mechanical attributes. The rock mechanical classifications were performed using a combination of parameters including measured log data (compressional and shear wave interval transit times) and augmented features such as Poisson's ratio, and Young's modulus. The performance of ML clustering models were evaluated using Silhouette index (SI) and Davies-Bouldin index (DBI) criteria. The evaluation measures of predicted classification reflected the effectiveness and applicability of the proposed ML approaches to generate mechanical stratigraphy. Evaluation measures SS and DBI represent the good quality and reliability of proposed classification with higher SI, CHI, and lower DBI scores. The best performance for the proposed clustering model was exhibited by K-means algorithm with SI, DBI and CHI scores of 0.86, 0.4, and 79, respectively. The proposed mechanical units cluster models were observed to be consistent with the lithological stratigraphy of the studied wells. This approach is therefore shown to provide efficient and reliable identification of mechanical stratigraphy for FORGE with the capability for application across a wide range of subsurface reservoirs. 1. INTRODUCTION Rocks are formed in different lithostratigraphic units that have a wide range of mechanical characteristics (Boersma et al., 2020). According to Ferrill et al. (2017) and Smart et al. (2014). The mechanical characteristics are often described in terms of stiffness and strength properties, including elastic parameters, tensile strength, and compressive strength (Roche et al., 2013).

  • Research Article
  • Cite Count Icon 3
  • 10.1007/s13198-016-0508-1
Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques
  • Jun 29, 2016
  • International Journal of System Assurance Engineering and Management
  • Sangeeta Sabharwal + 2 more

Data Warehouse provides the foundation for businesses to take informed decisions for day to day operations and making future strategy. Since the role is so pivotal to the growth and success of the business, its quality is very critical. Conceptual models of data warehouses give us a great insight into the quality of the developed system during the early stages of the design process. Researchers have proposed a number of metrics to evaluate the quality of these object oriented multidimensional models. Further, for these metrics to be used in practice, empirical evaluation is crucial. There are a number of propositions in literature that work towards empirical validation of metrics. But most of them are either restricted to statistical techniques or supervised machine learning techniques. In order to empirically validate the metrics, we need to get user responses for a number of schemas and take down observations to quantify model quality aspects like understandability, efficiency etc. This can result in personal biases, errors and random outliers which impacts the evaluation model. In this paper, we have made a first attempt to assess the relationship between the object oriented multidimensional data warehouse structural metrics and understandability of its models by using unsupervised machine learning techniques with the aid of a data warehouse quality expert. The results indicate that the proposed metrics have a strong relationship with understandability and inturn quality of the data warehouse conceptual models and the unsupervised techniques are able to identify this relationship with high degree of accuracy.

  • Book Chapter
  • Cite Count Icon 8
  • 10.1007/978-3-031-23443-9_19
Unsupervised Machine Learning Exploration of Morphological and Haemodynamic Indices to Predict Thrombus Formation in the Left Atrial Appendage
  • Jan 1, 2022
  • Marta Saiz-Vivó + 9 more

Atrial Fibrillation (AF) is the most common cardiac arrhythmia, and it is associated with an increased risk of embolic stroke. It is known that AF-related thrombus formation occurs predominantly in the left atrial appendage (LAA). However, it is still unknown the structural and functional characteristics of the left atria (LA) that promote low velocities and stagnated blood flow, thus a high risk of thrombogenesis. In this work, we investigated morphological and in-silico haemodynamic indices of the LA and LAA with unsupervised machine learning (ML) techniques, to identify the most relevant features that could subsequently be used to generate thrombus prediction models. A fully automatic pipeline was implemented to extract multiple morphological parameters from a 3D mesh of a LA. Morphological parameters were then combined with particle flow parameters from in-silico fluid simulations. Unsupervised multiple kernel learning (MKL) was used for dimensionality reduction, resulting in a latent space positioning patients based on feature similarity. Clustering applied to the MKL output space estimated clusters with different proportion of thrombus cases. The cluster with the highest risk of thrombus formation was characterised by high values of LAA height, tortuosity and ostium perimeter, as well as total number of flow particles in the LAA and low angle between the LAA and the left superior pulmonary vein, proving the usefulness of unsupervised ML techniques to extract knowledge from the data, and early identify AF patients at higher risk of thrombus formation.KeywordsAtrial fibrillationLeft atrial appendageUnsupervised machine-learningThrombus

  • Book Chapter
  • Cite Count Icon 7
  • 10.1007/978-981-16-3346-1_30
Supervised and Unsupervised Machine Learning Techniques for Multiple Sclerosis Identification: A Performance Comparative Analysis
  • Sep 20, 2021
  • Shikha Jain + 2 more

The identification of multiple sclerosis disease (MSD) is very crucial because it is a neurological disease in young people where an early detection is recommended. Accurate classification and segmentation using distinct machine learning techniques plays significant role in identifying MSD based on brain magnetic resonance (MR) images. In this work, a performance comparative analysis of various supervised and unsupervised machine learning techniques on eighteen gray level textural feature matrix (GLTFM) of brain MR images has been performed. Supervised machine learning (k-nearest neighbor, support vector machine and ensemble learning) classification techniques are utilized for MSD identification and compared with unsupervised machine learning-based clustering techniques (k-mean clustering and Gaussian mixture model). Accuracy has been evaluated for measuring proposed system’s execution on unhealthy brain magnetic resonance (MR) images from the e-health dataset and healthy control brain magnetic resonance (MR) images from private clinical dataset. These metrics are also compared with various state-of-the-art techniques. It has been verified that MSD identification from healthy and unhealthy brain MR images based on the proposed methodology using supervised machine learning techniques yields accuracy of 96.55% which is better than existing state-of-the-art techniques and unsupervised machine learning techniques.

  • Research Article
  • Cite Count Icon 28
  • 10.1007/s40622-020-00261-7
An inclusive survey on machine learning for CRM: a paradigm shift
  • Dec 1, 2020
  • DECISION
  • Narendra Singh + 2 more

Customer relationship management (CRM) is the tool to enhance customer relationship in any business. Due to the exponential growth of data volume, in any field, it is significant to develop new techniques to discover the customer knowledge, automation of the system and moreover customer satisfaction to win customer lifetime value. CRM with machine learning could bring a catalytic change in business. Several supervised and unsupervised machine learning techniques are utilized to improve the customer experience and profitability of business. This paper reviews the available literature on the CRM with machine learning techniques for customer identification, customer attraction, and customer retention and customer development. This study reveals that supervised learning techniques are 48.48% utilized, unsupervised learning techniques are utilized 15.15%, and 9.09% utilized other techniques in CRM. Paradigm is also shifted toward the deep learning from machine learning as 28.28% text has been reported to deep learning. Decision tree-based algorithm and support vector machine algorithms are most utilized algorithm of supervised learning. E-commerce and telecommunication sectors are the most important areas identified with the exponential growth of the users and hence need a suitable machine learning techniques for customer satisfaction and business profitability.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/iccca52192.2021.9666391
Movie Subtitle Document Classification Using Unsupervised Machine Learning Approach
  • Dec 17, 2021
  • Md Mehedi Hasan + 4 more

Since the evolution of digital and online text content, automatic document classification has become a significant research issue. There is a most commonly used machine learning approach to improve this task: an unsupervised approach, where no human interaction or labelling documents are required at any point throughout the whole procedure. This study addressed an approach for movie subtitle document classification using an unsupervised machine learning technique. The dataset has been created, collecting almost 500 English movie subtitle files based on the popular movies of IMDB. Two feature extraction methods have been used and combined with unsupervised machine learning algorithms and a dimension reduction technique has been used to reduce the dimensionality of this work. As unsupervised machine learning techniques, we used Bisecting K-Means, K-Means and Agglomerative Hierarchical Clustering Algorithm; Average link, Single Link and Double link. We assessed that K-means and Bisecting k-means are the best performers of the unsupervised techniques in the term of cluster quality. We addressed the reason for the outliers of the training set and recommended using unsupervised techniques to improve predefining categories and labelling the textual documents in the training set.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant