Abstract

the usefulness of the data entered by emergency hotlines (911,112) operators can be optimized by an automatic validation system for the quality and accuracy of this information and extracting additional data that may be useful for taking decision. For this, telephone exchanges can be transformed into text (using “Speech to Text”) to extract names of people or companies (aggressors, victims), places, type of crimes or offenses (using “Named Entity Recognition”). On the other hand, as this database grows, the need for intelligent search engine will be mandatory, for fast and intelligent inquiry of useful information. In this paper we did a comparative work, so firstly we focused on the part related to the classification of the Arabic written texts saved in the database of the system where we used different methods of transformation starting by TF-IDF(Term Frequency - Inverse Document Frequency) and word index Tokenizing passing to CBOW(Continuous Bag Of Words), DBOW(Distributed Bag Of Words) and Embedding, then we tested many suitable models as naive Bayesian models, deep neural networks with LSTM(Long Short Term Memory) and Word2vec concepts. At last we compared all to Transformers applying AraBERT(Arabic Bidirectional Encoder Representations for Transformers). And secondly, NER (Named Entity Recognition) model that classifies certain words/sequence of words in the text into the four entities (Suspect, Victim, Crime Location, Crime Date). This was accomplished by training and validating two machine learning algorithms for token classification (or named entity recognition), AraBERT being the model with the most significant results. This NER model was later tested on Arabic crime texts that we have scraped from Facebook, to examine its performance on new and unstructured Arabic data. Another goal achieved by this paper is the similarity searching by using Word2vec model which aims to better find information by applying an unstructured intelligent search that helps decision makers to get relevant intelligence on which to base their choices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call