A Comparison of Machine Learning Approaches for Detecting Misogynistic Speech in Urban Dictionary

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Recent moves to consider misogyny as a hate crime have refocused efforts for owners of web properties to detect and remove misogynistic speech. This paper considers the use of deep learning techniques for detection of misogyny in Urban Dictionary, a crowdsourced online dictionary for slang words and phrases. We compare the performance of two deep learning techniques, Bi-LSTM and Bi-GRU, to detect misogynistic speech with the performance of more conventional machine learning techniques, logistic regression, Naive-Bayes classification, and Random Forest classification. We find that both deep learning techniques examined have greater accuracy in detecting misogyny in the Urban Dictionary than the other techniques examined.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1002/ima.22905
COVID‐19: A systematic review of prediction and classification techniques
  • May 11, 2023
  • International Journal of Imaging Systems and Technology
  • Om Ramakisan Varma + 2 more

COVID‐19 has affected more than 760 million people all over the world, as per the latest record of the WHO. The rapid proliferation of COVID‐19 patients not only created a health emergency but also led to an economic crisis. An early and accurate diagnosis of COVID‐19 can help in combating this deadly virus. In line with this, researchers have proposed several machine learning (ML) and deep learning (DL) techniques for detecting COVID‐19 since 2020. This article presents currently available manual diagnosis methods along with their limitations. It also provides an extensive survey of ML and DL techniques that can support medical professionals in the precise diagnosis of COVID‐19. ML methods, namely K‐nearest neighbor, support vector machine (SVM), artificial neural network, decision tree, naive bayes, and DL methods, viz. deep neural network, convolutional neural network (CNN), region‐based convolutional neural network, and long short‐term memories, are explored. It also provides details of the latest COVID‐19 open‐source datasets, consisting of x‐ray and computed tomography scan images. A comparative analysis of ML and DL techniques developed for COVID‐19 detection in terms of methodology, datasets, sample size, type of classification, performance, and limitations is also done. It has been found that SVM is the most frequently used ML technique, while CNN is the most commonly used DL technique for COVID‐19 detection. The challenges of an existing dataset have been identified, including size and quality of datasets, lack of labeled datasets, severity level, data imbalance, and privacy concerns. It is recommended that there is a need to establish a benchmark dataset that overcomes these challenges to enhance the effectiveness of ML and DL techniques. Further, hurdles in implementing ML and DL techniques in real‐time clinical settings have also been highlighted. In addition, the motivation noticed from the existing methods has been considered for extending the research with an optimized DL model, which attained improved performance using statistical and deep features. The optimized deep model performs better than 90% based on efficient features and proper classifier tuning.

  • Research Article
  • Cite Count Icon 60
  • 10.1016/j.matpr.2020.11.351
Deep learning for material synthesis and manufacturing systems: A review
  • Jan 1, 2021
  • Materials Today: Proceedings
  • V Bhuvaneswari + 5 more

Deep learning for material synthesis and manufacturing systems: A review

  • Conference Article
  • 10.1115/isfa2020-9643
Transferable Deep Learning for In-Situ Tool Wear Diagnosis
  • Jul 8, 2020
  • Matthew Russell + 1 more

Emerging deep learning (DL) techniques, which have demonstrated the superior capability to learn complex patterns and interrelations from multivariate data, provide promising solutions to characterize and model complex system that cannot be accurately described by conventional machine learning techniques. Hence, DL techniques have been extensively studied for condition monitoring, diagnosis, and remaining life prediction of manufacturing machine and components. One challenge associated with DL techniques is that the accuracy and reliability of DL models would vary significantly with the data amount, variety, and machine operating scenarios that are used to train the models. If the trained model is applied beyond the training scenarios, an internal covariate shift problem would occur and thereby damage the model reliability. To address this challenge, the DL models should not only extract hierarchical features from the input data, but also study the similarities and differences among data collected from different scenarios and include the discovered similarities in the feature extraction mechanism to generalize models to a broad application. This paper presents a trial to develop a transferable convolutional neural network (CNN) for in-situ diagnosis tool wear severity under different operating conditions.

  • Research Article
  • Cite Count Icon 1
  • 10.48084/etasr.7631
Advancing Email Spam Classification using Machine Learning and Deep Learning Techniques
  • Aug 2, 2024
  • Engineering, Technology & Applied Science Research
  • Meaad Hamad Alsuwit + 2 more

Email communication has become integral to various industries, but the pervasive issue of spam emails poses significant challenges for service providers. This research proposes a study leveraging Machine Learning (ML) and Deep Learning (DL) techniques to effectively classify spam emails. Methods such as Logistic Regression (LR), Naïve Bayes (NB), Random Forest (RF), and Artificial Neural Networks (ANNs) are employed to construct robust models for accurate spam detection. By amalgamating these techniques, the aim is to enhance efficiency and precision in spam detection, aiding email and IoT service providers in mitigating the detrimental effects of spam. Evaluation of the proposed models revealed promising outcomes. LR, RF, and NB achieved an impressive accuracy of 97% and an F1-Score of 97.5%, showcasing their efficacy in accurately identifying spam emails. The ANN model demonstrated slightly superior performance, with 98% accuracy and 97.5% F1-score, suggesting potential improvements in accuracy and robustness in spam filtering systems. These findings underscore the viability of both traditional ML algorithms and DL approaches in addressing the challenges of email spam classification, paving the way for more effective spam detection mechanisms in electronic communication platforms.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.imu.2024.101530
A recall-optimised machine learning framework for small data improves risk stratification for Hirschsprung's disease
  • Jan 1, 2024
  • Informatics in Medicine Unlocked
  • Emilie G Jaroy + 4 more

A recall-optimised machine learning framework for small data improves risk stratification for Hirschsprung's disease

  • Research Article
  • Cite Count Icon 3
  • 10.11591/ijeecs.v35.i2.pp1244-1252
Android malware detection using GIST based machine learning and deep learning techniques
  • Aug 1, 2024
  • Indonesian Journal of Electrical Engineering and Computer Science
  • Ponnuswamy Udayakumar + 5 more

In today’s digital world, Android phones play a vital part in a variety of facets of both professionals and individuals’ personal and professional lives. Android phones are great for getting things done faster and more organized. The proportionate increase in the number of malicious applications has also been seen to be expanding. Since the play store offers millions of apps, detection of malware apps is challenging task. In this paper, a methodology is introduced for detecting malware in Android applications through the utilization of global image shape transform (GIST) features extracted from grayscale images of the applications. The dataset comprises samples of both malware and benign apps collected from the virus share website. After converting the apps into grayscale images, GIST features are extracted to capture their global spatial layout. Various machine learning (ML) algorithms, such as logistic regression (LR), k-nearest neighbour (KNN), AdaBoost, decision tree (DT), Naïve Bayes (NB), random forest (RF), support vector machine (SVM), extra tree classifier (ETC), and gradient boosting (GB), are employed to classify the applications according to their GIST features. Furthermore, a feed forward neural network (FFNN) is utilized as a deep learning (DL) technique to further improve the accuracy of classification. The performance of each algorithm is evaluated using metrics such as accuracy, precision and recall. The results demonstrated that the FFNN achieves superior accuracy compared to traditional ML classifiers, indicating its effectiveness in detecting malware in Android apps.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 26
  • 10.3390/jcm8020172
Exploration of Machine Learning for Hyperuricemia Prediction Models Based on Basic Health Checkup Tests.
  • Feb 2, 2019
  • Journal of Clinical Medicine
  • Sangwoo Lee + 2 more

Background: Machine learning (ML) is a promising methodology for classification and prediction applications in healthcare. However, this method has not been practically established for clinical data. Hyperuricemia is a biomarker of various chronic diseases. We aimed to predict uric acid status from basic healthcare checkup test results using several ML algorithms and to evaluate the performance. Methods: We designed a prediction model for hyperuricemia using a comprehensive health checkup database designed by the classification of ML algorithms, such as discrimination analysis, K-nearest neighbor, naïve Bayes (NBC), support vector machine, decision tree, and random forest classification (RFC). The performance of each algorithm was evaluated and compared with the performance of a conventional logistic regression (CLR) algorithm by receiver operating characteristic curve analysis. Results: Of the 38,001 participants, 7705 were hyperuricemic. For the maximum sensitivity criterion, NBC showed the highest sensitivity (0.73), and RFC showed the second highest (0.66); for the maximum balanced classification rate (BCR) criterion, RFC showed the highest BCR (0.68), and NBC showed the second highest (0.66) among the various ML algorithms for predicting uric acid status. In a comparison to the performance of NBC (area under the curve (AUC) = 0.669, 95% confidence intervals (CI) = 0.669–0.675) and RFC (AUC = 0.775, 95% CI 0.770–0.780) with a CLR algorithm (AUC = 0.568, 95% CI = 0.563–0.571), NBC and RFC showed significantly better performance (p < 0.001). Conclusions: The ML model was superior to the CLR model for the prediction of hyperuricemia. Future studies are needed to determine the best-performing ML algorithms based on data set characteristics. We believe that this study will be informative for studies using ML tools in clinical research.

  • Conference Article
  • Cite Count Icon 91
  • 10.23919/date.2019.8714862
PUFs Deep Attacks: Enhanced modeling attacks using deep learning techniques to break the security of double arbiter PUFs
  • Mar 1, 2019
  • Mahmoud Khalafalla + 1 more

In the past decade and a half, physical unclonable functions (PUFs) have been introduced as a promising cryptographic primitive for hardware security applications. Since then, the race between proposing new complex PUF architectures and new attack schemes to break their security has been ongoing. Although modeling attacks using conventional machine learning techniques were successful against many PUFs, there are still some delay-based PUF architectures which remain unbroken against such attacks, such as the double arbiter PUFs. These stronger complex PUFs have the potential to be a promising candidate for key generation and authentication applications. This paper presents an in-depth analysis of modeling attack using deep learning (DL) techniques against double arbiter PUFs (DA-PUFs). Unlike more conventional machine learning techniques such as logistic regression and support vector machines, DL results show enhanced prediction accuracy of the attacked PUFs, thus pushing up the boundaries of modeling attacks to break more complex architectures. The attack on 3-1 DAPUFs has improved accuracy of over 86% (compared to previous research achieving a maximum of 76%) and the 4-1 DAPUFs accuracy ranges between 71%-81.5% (compared to previous research of maximum 63%). This research is crucial for analyzing security of existing and future PUF architectures, confirming that as DL computations become more widely accessible, designers will need to hide the PUFs CRP relationship from attackers.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 38
  • 10.46481/jnsps.2021.308
Sentiment Analysis using various Machine Learning and Deep Learning Techniques
  • Nov 29, 2021
  • Journal of the Nigerian Society of Physical Sciences
  • V Umarani + 2 more

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

  • Abstract
  • 10.1016/j.spinee.2022.07.024
P68. Machine learning algorithms for predicting patient-controlled analgesia use after minimally invasive spine surgery
  • Aug 19, 2022
  • The Spine Journal
  • Daniel Shinn + 8 more

P68. Machine learning algorithms for predicting patient-controlled analgesia use after minimally invasive spine surgery

  • Research Article
  • Cite Count Icon 3
  • 10.54614/electrica.2022.22031
Multiple Classification of Cyber Attacks Using Machine Learning
  • May 24, 2022
  • ELECTRICA
  • Ebu Yusuf Güven + 4 more

With the rapid growth of technology, the Internet’s use and the number of devices connected to it are growing at a breakneck pace. As a result of this development, network traffic has increased in volume and has become more vulnerable. The focus has been on the development of learning intrusion detection systems in order to detect sophisticated and undetected threats. Because machine learning-based models achieve great accuracy in a short amount of time, they are commonly utilized in intrusion detection systems. Multiple classifications were made in this study to detect assaults on network traffic using machine learning. The model was created using the CICIDS2017 data set, which comprises both current and historical attacks. The high-performance computer was used to rapidly conduct tests on the CICIDS2017 data set, which contains around 2.8 million rows of data. We improved the performance of the machine learning models we developed by cleaning, normalizing, oversampling for an unbalanced number of labels, and reducing the size of the data set using feature selection methods. The random forest, decision tree, logistic regression, and Naive Bayes classifiers were all implemented on the pre-processed data set, and it was observed that the random forest classifier had the highest accuracy of 99.94%. Cite this article as: E. Yusuf Güven, S. Gülgün, C. Manav, B. Bakır and G. Zeynep Gürkaş Aydın, "Multiple classification of cyber attacks using machine learning," Electrica., 22(2), 313-320, 2022.

  • Research Article
  • 10.1051/itmconf/20257403008
Comparative evaluation of deep learning and machine learning techniques for sentiment analysis of electronic product review data
  • Jan 1, 2025
  • ITM Web of Conferences
  • Archana Nagelli + 2 more

The primary thoughts, perceptions, attitudes, feedback, and even emotions expressed by people on social networking and e-commerce sites are the primary focus of sentiment analysis also referred to as opinion mining. It provides meaningful information to various stakeholders and customers in influencing their next move. However, the biggest challenge is the extraction of relevant information from the tremendous data. Machine learning and deep learning techniques have obtained remarkable success in exemplifying and classifying information. Machine learning works with the binary classification of information, whereas deep learning provides automatic feature detection. A study was carried out to extract the relevant information from the Amazon reviews dataset of electronics products. The Naïve Bayes, support vector machine, decision tree, convolution neural network, long short term memory, recursive neural networks, and recurrent neural networks were used on the dataset after applying different data preprocessing. To evaluate the performance of various machine learning and deep learning techniques, frameworks, F1 score, precision, recall as well as, accuracy was used. The results suggest that deep learning techniques have outperformed the machine learning techniques, and RNN shows the highest accuracy among all the techniques.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.46298/arima.9291
Étude comparative d'algorithmes d'apprentissage artificiel pour la reconnaissance faciale
  • Mar 7, 2024
  • Revue Africaine de Recherche en Informatique et Mathématiques Appliquées
  • Atsu Alagah Komlavi + 2 more

Background: The fundamental need for authentication and identification of humans using their physiological, behavioral or biological characteristics, continues to be applied extensively to secure localities, property, financial transactions, etc. Biometric systems based on face characteristics, continue to attract the attention of researchers, major public and private services. In the literature, many methods have been deployed by different authors. The best performance must be found in order to be able to recommend the most effective method. So, the main objective of thisarticle is to make a comparative study of different existing techniques.Methods: A biometric system is generally composed of four stages: acquisition of facial images, preprocessing, extraction of characteristics and finally classification. In this work, the focus is on machine learning algorithms for classification. These algorithms are: Support Vector Machines (SVM), Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Random Forests (RF), Logistic Regression (LR), Naive Bayesian Classification (NB: Naive Bayes’ Classifiers) and deep learning techniques such as Convolutional Neural Networks (CNN). The comparison criterion is the average performance, calculated using three performance measures: recognition rate, confusion matrix, and the Area Under Receiver Operating Characteristic (ROC) curve.Results: Based on this criterion, the performance comparison of selected machine learning algorithms, shows that CNN is the best, with an average performance of 100.00% On ORL face database. However, on the YALE database, classical algorithms such as artificial neural networks have obtained the best performances, the highest being a rate of 100%.Discussion: Deep learning techniques are very efficient in image classification as proven by the results on the ORL database. However, their inefficiency on YALE face database is due to the small size of this database which is inappropriate for some deep learning algorithms. But this weakness can be corrected by image augmentation techniques. The comparison of these results with existing state-of-the-art methods is nearly the same. Authors achieved performances of 94.82%, 95.79%, 96.15%, 96.44%, 97.27%, 98.52% and 98.95% for NB, KNN, RF, LR, ANN, SVM and CNN classifiers, respectively. Finally, in depth discussion, it is concluded that between all these approaches which are useful in face recognition, the CNN is the best classification algorithm.

  • Research Article
  • Cite Count Icon 16
  • 10.1007/s11517-020-02256-z
Image-based state-of-the-art techniques for the identification and classification of brain diseases: a review.
  • Sep 22, 2020
  • Medical &amp; Biological Engineering &amp; Computing
  • Ejaz Ul Haq + 4 more

Detection and classification methods have a vital and important role in identifying brain diseases. Timely detection and classification of brain diseases enable an accurate identification and effective management of brain impairment. Brain disorders are commonly most spreadable diseases and the diagnosing process is time-consuming and highly expensive. There is an utmost need to develop effective and advantageous methods for brain diseases detection and characterization. Magnetic resonance imaging (MRI), computed tomography (CT), and other various brain imaging scans are used to identify different brain diseases and disorders. Brain imaging scans are the efficient tool to understand the anatomical changes in brain in fast and accurate manner. These different brain imaging scans used with segmentation techniques and along with machine learning and deep learning techniques give maximum accuracy and efficiency. This paper focuses on different conventional approaches, machine learning and deep learning techniques used for the detection, and classification of brain diseases and abnormalities. This paper also summarizes the research gap and problems in the existing techniques used for detection and classification of brain disorders. Comparison and evaluation of different machine learning and deep learning techniques in terms of efficiency and accuracy are also highlighted in this paper. Furthermore, different brain diseases like leukoariaosis, Alzheimer's, Parkinson's, and Wilson's disorder are studied in the scope of machine learning and deep learning techniques.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1186/s13677-023-00577-6
Innovative deep learning techniques for monitoring aggressive behavior in social media posts
  • Jan 16, 2024
  • Journal of Cloud Computing
  • Huimin Han + 5 more

The study aims to evaluate and compare the performance of various machine learning (ML) classifiers in the context of detecting cyber-trolling behaviors. With the rising prevalence of online harassment, developing effective automated tools for aggression detection in digital communications has become imperative. This research assesses the efficacy of Random Forest, Light Gradient Boosting Machine (LightGBM), Logistic Regression, Support Vector Machine (SVM), and Naive Bayes classifiers in identifying cyber troll posts within a publicly available dataset. Each ML classifier was trained and tested on a dataset curated for the detection of cyber trolls. The performance of the classifiers was gauged using confusion matrices, which provide detailed counts of true positives, true negatives, false positives, and false negatives. These metrics were then utilized to calculate the accuracy, precision, recall, and F1 scores to better understand each model’s predictive capabilities. The Random Forest classifier outperformed other models, exhibiting the highest accuracy and balanced precision-recall trade-off, as indicated by the highest true positive and true negative rates, alongside the lowest false positive and false negative rates. LightGBM, while effective, showed a tendency towards higher false predictions. Logistic Regression, SVM, and Naive Bayes displayed identical confusion matrix results, an anomaly suggesting potential data handling or model application issues that warrant further investigation. The findings underscore the effectiveness of ensemble methods, with Random Forest leading in the cyber troll detection task. The study highlights the importance of selecting appropriate ML algorithms for text classification tasks in social media contexts and emphasizes the need for further scrutiny into the anomaly observed among the Logistic Regression, SVM, and Naive Bayes results. Future work will focus on exploring the reasons behind this occurrence and the potential of deep learning techniques in enhancing detection performance.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.