An overview of Bamum phonology and orthography, with an additional focus on character and word frequencies in recent poetry

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract While the history of the Bamum language has been well documented, the particulars of its phonology and orthography have received fragmentary treatment. Our contribution to the literature is an attempt to synthesize from the works that have gone before, to pull together disparate threads that have arisen from the influences of German, French, and English traditions. We discuss in turn phonology (noting in particular discrepancies in the accounting for the vowel inventory between sources), analysis of the writing system, and frequency distribution of characters and words. Discussion on historical phonology might shed some light on the development of orthographic principles while the preliminary exploration of the observed statistical patterns of the latest phase of the script, known as A ka u ku mfɛmfɛ , would be useful for future reference. We also offer indications for directions for future research on a wider scale, that would incorporate corpus studies of both the Bamum language and the invented Shümom language that also uses the Bamum script.

Similar Papers
  • Research Article
  • Cite Count Icon 38
  • 10.1177/1536867x1801800205
Content Analysis: Frequency Distribution of Words
  • Jun 1, 2018
  • The Stata Journal: Promoting communications on statistics and Stata
  • Mehmet F Dicle + 1 more

Many academic fields use content analysis. At the core of most common content analysis lies frequency distribution of individual words. Websites and documents are mined for usage and frequency of certain words. In this article, we introduce a community-contributed command, wordfreq, to process content (online and local) and to prepare a frequency distribution of individual words. Additionally, another community-contributed command, wordcloud, is introduced to draw a simple word cloud graph for visual analysis of the frequent usage of specific words.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/mipro.2014.6859820
Complex networks measures for differentiation between normal and shuffled Croatian texts
  • May 1, 2014
  • Domagoj Margan + 2 more

This paper studies the properties of the Croatian texts via complex networks. We present network properties of normal and shuffled Croatian texts for different shuffling principles: on the sentence level and on the text level. In both experiments we preserved the vocabulary size, word and sentence frequency distributions. Additionally, in the first shuffling approach we preserved the sentence structure of the text and the number of words per sentence. Obtained results showed that degree rank distributions exhibit no substantial deviation in shuffled networks, and strength rank distributions are preserved due to the same word frequencies. Therefore, standard approach to study the structure of linguistic co-occurrence networks showed no clear difference among the topologies of normal and shuffled texts. Finally, we showed that the in- and out- selectivity values from shuffled texts are constantly below selectivity values calculated from normal texts. Our results corroborate that the node selectivity measure can capture structural differences between original and shuffled Croatian texts.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.jml.2023.104497
Word length and frequency effects on text reading are highly similar in 12 alphabetic languages
  • Dec 20, 2023
  • Journal of Memory and Language
  • Victor Kuperman + 2 more

Word length and frequency effects on text reading are highly similar in 12 alphabetic languages

  • Research Article
  • Cite Count Icon 8
  • 10.1080/01611194.2016.1206753
Hoaxing statistical features of the Voynich Manuscript
  • Sep 13, 2016
  • Cryptologia
  • Gordon Rugg + 1 more

ABSTRACTIn a previous article, the first author demonstrated that simple materials and techniques could produce meaningless text of comparable complexity to the text in the Voynich Manuscript, at a speed which made a hoax a feasible explanation. The table and grille method described in that article also replicated the main qualitative features of the text in the Voynich Manuscript. In this article, the authors demonstrate that the same table and grille method can also replicate the main quantitative statistical features of the text in the Voynich Manuscript, namely a distribution of word frequencies that mimics Zipf’s distribution, a symmetrical distribution of word length frequencies, and a non-homogeneous distribution of words and of syllables across a corpus of text produced using this method. The main unusual qualitative and quantitative features of the Voynich Manuscript are therefore explicable as products of a low-technology hoax, with no need to invoke an undiscovered new type of code and/or the presence of meaningful text in the manuscript.

  • Research Article
  • Cite Count Icon 1
  • 10.1108/00220410610688750
Aggregation consistency and frequency of Chinese words and characters
  • Sep 1, 2006
  • Journal of Documentation
  • Clément Arsenault

PurposeAims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws.Design/methodology/approachUses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov‐Smirnov test.FindingsFinds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law.Research limitations/implicationsThe findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option.Originality/valueProvides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3389/fpsyg.2024.1208029
Word frequency and cognitive effort in turns-at-talk: turn structure affects processing load in natural conversation.
  • Jun 5, 2024
  • Frontiers in psychology
  • Christoph Rühlemann + 1 more

Frequency distributions are known to widely affect psycholinguistic processes. The effects of word frequency in turns-at-talk, the nucleus of social action in conversation, have, by contrast, been largely neglected. This study probes into this gap by applying corpus-linguistic methods on the conversational component of the British National Corpus (BNC) and the Freiburg Multimodal Interaction Corpus (FreMIC). The latter includes continuous pupil size measures of participants of the recorded conversations, allowing for a systematic investigation of patterns in the contained speech and language on the one hand and their relation to concurrent processing costs they may incur in speakers and recipients on the other hand. We test a first hypothesis in this vein, analyzing whether word frequency distributions within turns-at-talk are correlated with interlocutors' processing effort during the production and reception of these turns. Turns are found to generally show a regular distribution pattern of word frequency, with highly frequent words in turn-initial positions, mid-range frequency words in turn-medial positions, and low-frequency words in turn-final positions. Speakers' pupil size is found to tend to increase during the course of a turn at talk, reaching a climax toward the turn end. Notably, the observed decrease in word frequency within turns is inversely correlated with the observed increase in pupil size in speakers, but not in recipients, with steeper decreases in word frequency going along with steeper increases in pupil size in speakers. We discuss the implications of these findings for theories of speech processing, turn structure, and information packaging. Crucially, we propose that the intensification of processing effort in speakers during a turn at talk is owed to an informational climax, which entails a progression from high-frequency, low-information words through intermediate levels to low-frequency, high-information words. At least in English conversation, interlocutors seem to make use of this pattern as one way to achieve efficiency in conversational interaction, creating a regularly recurring distribution of processing load across speaking turns, which aids smooth turn transitions, content prediction, and effective information transfer.

  • Research Article
  • Cite Count Icon 580
  • 10.3758/s13423-014-0585-6
Zipf’s word frequency law in natural language: A critical review and future directions
  • Mar 25, 2014
  • Psychonomic Bulletin & Review
  • Steven T Piantadosi

The frequency distribution of words has been a key object of study in statistical linguistics for the past 70 years. This distribution approximately follows a simple mathematical form known as Zipf's law. This article first shows that human language has a highly complex, reliable structure in the frequency distribution over and above this classic law, although prior data visualization methods have obscured this fact. A number of empirical phenomena related to word frequencies are then reviewed. These facts are chosen to be informative about the mechanisms giving rise to Zipf's law and are then used to evaluate many of the theoretical explanations of Zipf's law in language. No prior account straightforwardly explains all the basic facts or is supported with independent evaluation of its underlying assumptions. To make progress at understanding why language obeys Zipf's law, studies must seek evidence beyond the law itself, testing assumptions and evaluating novel predictions with new, independent data.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.csda.2010.04.001
On the measure and the estimation of evenness and diversity
  • Apr 7, 2010
  • Computational Statistics & Data Analysis
  • Josep Ginebra + 1 more

On the measure and the estimation of evenness and diversity

  • Research Article
  • 10.3389/fpsyg.2022.940950
Word frequency effects found in free recall are rather due to Bayesian surprise.
  • Aug 25, 2022
  • Frontiers in psychology
  • Serban C Musca + 1 more

The inconsistent relation between word frequency and free recall performance (sometimes a positive one, sometimes a negative one, and sometimes no relation) and the non-monotonic relation found between the two cannot all be explained by current theories. We propose a theoretical framework that can explain all extant results. Based on an ecological psychology analysis of the free recall situation in terms of environmental and informational resources available to the participants, we propose that because participants’ cognitive system has been shaped by their native language, free recall performance is best understood as the end result of relational properties that preexist the experimental situation and of the way the words from the experimental list interact with those. In addition to this, we borrow from predictive coding theory the idea that the brain constantly predicts “what is coming next” so that it is mainly prediction errors that will propagate information forward. Our ecological psychology analysis indicates there will be “prediction errors” because the word frequency distribution in an experimental word list is inevitably different from the particular Zipf’s law distribution of the words in the language that shaped participants’ brains. We further propose the particular distributional discrepancies inherent to a given word list will trigger, as a function of the words that are included in the list, their order, and of the words that are absent from the list, a surprisal signal in the brain, something that is isomorphic to the concept of Bayesian surprise. The precise moment when Bayesian surprise is triggered will determine to what word of the list that Bayesian surprise will be associated with, and the word the Bayesian surprise will be associated with will benefit from it and become more memorable as a direct function of the magnitude of the surprisal. Two experiments are presented that show a proxy of Bayesian surprise explains the free recall performance and that no effect of word frequency is found above and beyond the effect of that proxy variable. We then discuss how our view can account for all data extant in the literature on the effect of word frequency on free recall.

  • Book Chapter
  • 10.4018/978-1-4666-6252-0.ch007
Research on Letter and Word Frequency and Mathematical Modeling of Frequency Distributions in the Modern Bulgarian Language
  • Jan 1, 2014
  • Tihomir Trifonov + 1 more

The purpose of this chapter is to present current research on the modern Bulgarian language. It is one of the oldest European languages. An information system for the management of the electronic archive with texts in Bulgarian language is described. It provides the possibility for processing the collected text information. The detailed and comprehensive researches on the letter and the word frequency in the modern Bulgarian language from varied sources (fiction, scientific and popular science literature, press, legal texts, government bulletins, etc.) are performed, and the obtained results are represented. The index of coincidence of the Bulgarian language as a whole and for the individual sources is computed. The results can be utilized by different specialists – computer scientists, linguists, cryptanalysts, and others. Furthermore, with mathematical modeling, the authors found the letter and word frequency distributions and their models and they estimated their standard deviations by documents.

  • Research Article
  • Cite Count Icon 15
  • 10.1080/09296174.2016.1265792
Variation in Word Frequency Distributions: Definitions, Measures and Implications for a Corpus-Based Language Typology
  • Jan 17, 2017
  • Journal of Quantitative Linguistics
  • Christian Bentz + 3 more

Word frequencies are central to linguistic studies investigating processing difficulty, learnability, age of acquisition, diachronic transmission and the relative weight given to a concept in society. However, there are few cross-linguistic studies on entire distributions of word frequencies, and even less on systematic changes within them. Here, we first define and test an exact measure for the relative difference between distributions – the Normalised Frequency Difference (NFD). We then apply this measure to parallel corpora in overall 19 languages, explaining systematic variation in the frequency distributions within the same language and across different languages. We further establish the NFD between lemmatised and un-lemmatised corpora as a frequency-based measure of inflectional productivity of a language. Finally, we argue that quantitative measures like the NFD can advance language typology beyond abstract, theory-driven expert judgments, towards more corpus-based, empirical and reproducible analyses.

  • Research Article
  • Cite Count Icon 13
  • 10.1080/09296174.2012.685305
A Statistical Study on Chinese Word and Character Usage in Literatures from the Tang Dynasty to the Present
  • Aug 1, 2012
  • Journal of Quantitative Linguistics
  • Qinghua Chen + 2 more

In this paper, we carried out a statistical analysis on the Chinese corpus in the Tang, Song, Yuan, Ming and Qing Dynasties, as well as in the modern time. We found that character and word frequencies change over time so that the word frequency always abides by the Zipf-Mandelbrot law p(r) = C(r + r 0)−β, while the character frequency follows the Menzerath-Altmann law P(r) = Ae −ar r −b . In the case of the character frequency distribution, the exponential property increases and the power-law feature declines as time passes by. We also found that more and more compound words were created since the Tang Dynasty. Single-character words show up unevenly in the whole word frequency distribution, with more of them concentrating in the earlier period and decaying exponentially.

  • Conference Article
  • Cite Count Icon 18
  • 10.1145/3232116.3232152
An Improved TF-IDF algorithm based on word frequency distribution information and category distribution information
  • May 19, 2018
  • Haoying Wu + 1 more

Traditional TF-IDF (Term Frequency-Inverse Document Frequency) feature weighting algorithm only uses word frequency information as a measure of the importance of feature items in the data set. This results in the inability to correctly reflect the differences between documents of different categories. This paper proposes an improved feature weighting algorithm FDCD-TF-IDF based on word frequency distribution information and category distribution information. The improved algorithm introduces the concept of word frequency distribution and class distribution to describe the weight of the feature item more accurately. The word frequency distribution is mainly aimed at the correlation between feature items and categories, and the category distribution can better reflect category information of feature items. This improved algorithm can accurately reflect the differences between different text categories. The experimental results show that the improved algorithm can achieve better classification results on both balanced and unbalanced text data sets.

  • Research Article
  • Cite Count Icon 3
  • 10.2139/ssrn.2997101
Content Analysis: Frequency Distribution of Words
  • Jul 7, 2017
  • SSRN Electronic Journal
  • Mehmet F Dicle + 1 more

Wide range of academic fields use content analysis. In the core of most common content analysis lies frequency distribution of individual words. Web sites and documents are mined for usage of certain words as well as for their frequency. We introduce a user written command wordfreq to process content (on-line and local) and to prepare a frequency distribution of individual words. Additionally, another user written command wordcloud is introduced to draw a simple word cloud graph for visual analysis of the frequent usage of specific words.

  • Research Article
  • 10.1142/s021972001450019x
Measurement of word frequencies in genomic DNA sequences based on partial alignment and fuzzy set.
  • Aug 1, 2014
  • Journal of bioinformatics and computational biology
  • Fumiya Shida + 1 more

Accompanied with the rapid increase of the amount of data registered in the databases of biological sequences, the need for a fast method of sequence comparison applicable to sequences of large size is also increasing. In general, alignment is used for sequence comparison. However, the alignment may not be appropriate for comparison of sequences of large size such as whole genome sequences due to its large time complexity. In this article, we propose a semi alignment-free method of sequence comparison based on word frequency distributions, in which we partially use the alignment to measure word frequencies along with the idea of fuzzy set theory. Experiments with ten bacterial genome sequences demonstrated that the fuzzy measurements has the effect that facilitates discrimination between close relatives and distant relatives.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.