Abstract
Sequences formed by symbols are found in diverse fields, including genome sequences, written texts and computer codes. An interesting question is whether a sequence of symbols contains correlated structures. Existing methods to characterize correlations require a numerical representation of the sequence. In this regard, mapping a sequence of text into a sequence of numerical values is a key step for assessing correlation analysis. This work proposes a methodology to study correlations in a sequence of symbols. In the first step, the sequence of symbols is mapped in a multivariate numerical sequence formed by unit vectors in a vectorial space. The main feature of such mapping is that symbols are equally weighted, thus avoiding the numerical overrepresentation of symbols. In the second step, a multivariate version of the detrended fluctuation analysis is used to quantify correlations in the numerical sequence. Genome sequences (first COVID-19), written English texts and comovements between Bitcoin and gold markets were used to illustrate the proposed methodology’s performance. The results showed that the balanced numerical mapping of symbolic sequences and the multivariate DFA provides valuable insights into the correlations in a sequence of symbols.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.