Markov Chain Monte Carlo for generating ranked textual data

Roy Cerqueti,Valerio Ficcadenti,Gurjeet Dhesi,Marcel Ausloos

doi:10.1016/j.ins.2022.07.137

Abstract

This paper faces a central theme in applied statistics and information science, which is the assessment of the stochastic structure of rank-size laws in text analysis. We consider the words in a corpus by ranking them on the basis of their frequencies in descending order. The starting point is that the ranked data generated in linguistic contexts can be viewed as the realisations of a discrete states Markov chain, whose stationary distribution behaves according to a discretisation of the best fitted rank-size law. The employed methodological toolkit is Markov Chain Monte Carlo, specifically referring to the Metropolis–Hastings algorithm. The theoretical framework is applied to the rank-size analysis of the hapax legomena occurring in the speeches of the US Presidents. We offer a large number of statistical tests leading to the consistency of our methodological proposal. To pursue our scopes, we also offer arguments supporting that hapaxes are rare (“extreme”) events resulting from memory-less-like processes. Moreover, we show that the considered sample has the stochastic structure of a Markov chain of order one. Importantly, we discuss the versatility of the method, which is considered suitable for deducing similar outcomes for other applied science contexts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Markov Chain Monte Carlo for generating ranked textual data

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Aug 6, 2022
Citations: 1

Similar Papers

Strong stability and perturbation bounds for discrete Markov chains
Boualem Rabta ... Djamil Aïssani
Linear Algebra and its Applications | VOL. 428
Boualem Rabta, et. al.Boualem Rabta ... Djamil Aïssani
20 Feb 2008
Linear Algebra and its Applications | VOL. 428

Process Modeling for Energy Usage in " Smart House " System with a Help of Markov Discrete Chain
...
-
, et. al. ...
14 Apr 2017
14 Apr 2017

Small sets and Markov transition densities
Wilfrid S Kendall ... Giovanni Montana
Stochastic Processes and their Applications | VOL. 99
Wilfrid S Kendall, et. al.Wilfrid S Kendall ... Giovanni Montana
15 Mar 2002
Stochastic Processes and their Applications | VOL. 99

Algorithms for improving efficiency of discrete Markov chains
Nabanita Mukherjee ... Kshitij Khare
Indian Journal of Pure and Applied Mathematics | VOL. 48
Nabanita Mukherjee, et. al.Nabanita Mukherjee ... Kshitij Khare
01 Dec 2017
Indian Journal of Pure and Applied Mathematics | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Markov Chain Monte Carlo for generating ranked textual data

Abstract

Talk to us

Similar Papers

More From: Information Sciences