Abstract

The intervention shows the first results of a research conducted on a corpus of 7000 posts collected on the Reddit social network during the 2016 American presidential campaign. The research is the result of a collaboration between Berkeley D-Lab, who shared the corpus, LSI - CentraleSupelec and CUBE. Thanks to funding from the Anti-Defamation League, the corpus has been labeled to apply Machine Learning techniques: 400 posts have been labeled as “hate speech” by human analysts. Galofaro, Toffano and Doan applied to both sub-corpora (hate and non-hate speeches) an analysis technique inspired by Greimas’s structural semantics, Eco’s semiotics, and Quantum Information Retrieval (van Rijsbergen).Each text was formalized as a semantic network using the HAL technique. We then measured the semantic similarity between two key words formalized as two word-vectors with the classical measure of cosine-similarity and then compared it with the degree of quantum correlation between them measured with the Born rule. This correlation, linked to the co-occurrence of the word vectors in the same contexts, extracts from the latter useful information to characterize the considered semantic relationships (“presence of correlation”, “absence of correlation” or “presence of anti-correlation”). In this way, the new technique allows to overcome some critical aspects of the Machine Learning techniques currently in use, being based on the meaning of the text and not on the way in which the human analyst labels the corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call