Articles published on Word Boundary Markers
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
34 Search results
Sort by Recency
- Research Article
- 10.3758/s13423-025-02785-4
- Feb 19, 2026
- Psychonomic bulletin & review
- Jingxin Wang + 6 more
Grammaticality decision studies show that word order is processed flexibly during reading, as participants often misread sentences containing transposed words as if they were correctly ordered (e.g., Mirault et al., Psychological Science, 29 (12), 1922-1929. 2018). The OB1-Reader model (Snell et al., Psychological Review, 125 (6), 969-984, 2018) explains this effect as arising from positional uncertainty during parallel word recognition, proposing that low-level visual cues like word length help constrain word positions, so that transpositions are easier to detect when words differ in length. In Chinese, the absence of this effect has been attributed to limited variability in word length and lack of explicit word boundaries. We therefore investigated whether marking word boundaries using interword spaces (Experiment 1) or alternating text-color (Experiment 2) would elicit a word-length effect on transposed-word detection in Chinese. Both experiments produced robust transposed-word effects, some indication that explicit boundary cues improve transposed-word detection, but with no evidence that they elicit a word-length effect on transposed-word detection. Together with converging evidence from French, these findings suggest both that boundary cues do not reliably reduce positional uncertainty in Chinese, and that low-level visual cues like word length have limited influence on positional processing in either alphabetic scripts or Chinese.
- Research Article
- 10.1017/s136672892510076x
- Nov 27, 2025
- Bilingualism: Language and Cognition
- Lin Li + 6 more
Abstract Sentences written in Chinese are composed of continuous sequences of characters, without spaces or other visual cues to mark word boundaries. While skilled L1 readers can efficiently segment this naturally unspaced text into words, little is known about the word segmentation capabilities of L2 readers, including whether they employ the same strategies to process temporary segmental ambiguities. Accordingly, we report two eye movement experiments that investigated the processing of sentences containing temporarily ambiguous “incremental” three-character words (e.g., “体育馆,” meaning “stadium”) whose first two characters could also form a word (“体育,” meaning “sport”), comparing the performance of 48 skilled L1 Chinese readers and 48 high-proficiency L2 Chinese readers in each experiment. Our findings reveal that both groups can process this ambiguity efficiently, employing similar word segmentations strategies. We discuss our findings in relation to models of eye movement control and word recognition in Chinese reading.
- Research Article
- 10.3389/fpsyg.2025.1652627
- Sep 19, 2025
- Frontiers in Psychology
- Weiqiong Jin + 2 more
Given that Chinese text lacks explicit spaces to mark word boundaries, readers need to segment the continuous text into words of varying lengths. Contextual information helps determine word boundaries in Chinese reading. However, it remains unclear how contextual constraint and word length information guide eye movements during Chinese reading. To address this issue, the present study examined the relationship between contextual constraint and word length information in determining when and where to move the eyes in Chinese reading. We manipulated contextual constraint such that the target words were either predictable or unpredictable, and manipulated word length such that the target words were either single-character or three-character. The results demonstrated that both contextual constraint and word length influenced word skipping, fixation durations, saccade lengths, and landing positions. However, we did not find significant interactions between them across all measures. Moreover, Bayes factor analysis provided strong evidence for the absence of an interaction, suggesting that contextual constraint does not modulate the effect of word length on eye-movement control in Chinese reading. These findings advance our understanding of eye-movement control mechanisms in Chinese reading and provide empirical evidence for improving existing models of Chinese reading.
- Research Article
2
- 10.3390/bs15070904
- Jul 3, 2025
- Behavioral sciences (Basel, Switzerland)
- Lin Li + 3 more
Chinese lacks explicit word boundary markers, creating frequent temporary segmental ambiguities where character sequences permit multiple plausible lexical analyses. Skilled native (L1) Chinese readers resolve these ambiguities efficiently. However, mechanisms underlying word segmentation in second language (L2) Chinese reading remain poorly understood. Our study investigated: (1) whether L2 readers experience greater difficulty processing temporary segmental ambiguities compared to L1 readers, and (2) whether visual boundary cues can facilitate ambiguity resolution in L2 reading. We measured the eye movements of 102 skilled L1 and 60 high-proficiency L2 readers for sentences containing temporarily ambiguous three-character incremental words (e.g., "" [musical]), where the initial two characters ("" [music]) also form a valid word. Sentences were presented using either neutral mono-color displays providing no segmentation cues, or color-coded displays marking word boundaries. The color-coded displays employed either uniform coloring to promote resolution of the segmental ambiguity or contrasting colors for the two-character embedded word versus the final character to induce a segmental misanalysis. The L2 group read more slowly than the L1 group, employing a cautious character-by-character reading strategy. Both groups nevertheless appeared to process the segmental ambiguity effectively, suggesting shared segmentation strategies. The L1 readers showed little sensitivity to visual boundary cues, with little evidence that this influenced their ambiguity processing. By comparison, L2 readers showed greater sensitivity to these cues, with some indication that they affected ambiguity processing. The overall sentence-level effects of color coding word boundaries were nevertheless modest for both groups, suggesting little influence of visual boundary cues on overall reading fluency for either L1 or L2 readers.
- Research Article
3
- 10.1007/s41809-024-00159-1
- Jan 29, 2025
- Journal of Cultural Cognitive Science
- Xiaoyun Wang + 2 more
Spaces or colors? The role of marking word boundaries on reading aloud in Javanese script
- Research Article
1
- 10.1016/j.learninstruc.2024.102034
- Oct 18, 2024
- Learning and Instruction
- Weiyan Liao + 1 more
Does word boundary information facilitate Chinese sentence reading in children as beginning readers?
- Research Article
7
- 10.1111/nyas.15178
- Jul 1, 2024
- Annals of the New York Academy of Sciences
- Linjieqiong Huang + 2 more
One difference among writing systems is how orthographic cues are used to demarcate words; although most alphabetic scripts use inter-word spaces, some Asian scripts do not explicitly mark word boundaries (e.g., Chinese). It is unclear whether these differences are arbitrary or whether they are designed to maximize reading efficiency. Here, we show that spaces inserted between words in non-demarcated scripts provide less information about word boundaries than spaces in demarcated scripts. Furthermore, despite the fact that less information is contained by inter-word spaces than characters/letters of the same size, the information content of inter-word spaces in demarcated scripts is closer to that of characters/letters compared to the information content of inter-word spaces that are inserted in non-demarcated scripts. These results suggest that the conventions used to demarcate word boundaries are sufficient to support efficient reading. Our findings provide new insights into the universals and variation across writing systems and shed light on the mental processes that support skilled reading.
- Research Article
6
- 10.1038/s41598-022-25759-1
- Jan 6, 2023
- Scientific Reports
- Danhui Wang + 7 more
Interword spaces exist in the texts of many languages that use alphabetic writing systems. In most cases, interword spaces, as a kind of word boundary information, play an important role in the reading process of readers. Tibetan also uses alphabetic writing, its text has no spaces between words as word boundary markers. Instead, there are intersyllable tshegs (“”), which are superscript dots. Interword spaces play an important role in reading as word boundary information. Therefore, it is interesting to investigate the role of tshegs and what effect replacing tshegs with spaces will have on Tibetan reading. To answer these questions, Experiment 1 was conducted in which 72 Tibetan undergraduates read three-syllable-boundary conditions (normal, spaced, and untsheged). However, in Experiment 1, because we performed the experimental operations of deleting tshegs and replacing tshegs, the spatial information distribution of Tibetan sentences under different operating conditions was different, which may have a certain potential impact on the experimental results. To rule out the underlying confounding factor, in Experiment 2, 58 undergraduates read sentences for both untsheged and alternating-color conditions. Overall, the global and local analyses revealed that tshegs, spaces, and alternating-color markers as syllable boundaries can help readers segment syllables in Tibetan reading. In Tibetan reading, both spaces and tshegs are effective visual syllable segmentation cues, and spaces are more effective visual syllable segmentation cues than tshegs.
- Research Article
6
- 10.1037/xlm0000817
- Jan 1, 2021
- Journal of Experimental Psychology: Learning, Memory, and Cognition
- Jingwen Wang + 3 more
Since there are no spaces between words to mark word boundaries in Chinese, it is common to see 2 identical neighboring characters in natural text. Usually, this occurs when there are 2 adjacent words containing the same character (we will call such a coincidental sequence of 2 identical characters repeated characters). In the present study, we examined how Chinese readers process words when there are repeated characters. In 3 experiments, we compared how Chinese readers process 4-character strings including 2 repeated characters (e.g. , pinyin: xíngdòng dòngjī, meaning behavioral motivation) with a control condition where none of the characters repeat (e.g. , pinyin: xíngdòng yùwàng, meaning behavioral desire). In Experiment 1, the 4-character strings were presented for 40 ms and participants were asked to report as many characters as possible. Participants reported the second and third characters less accurately in the repeated condition than the control condition. In Experiments 2A and 2B, we embedded 2 different types of 4-character strings, compound Chinese characters and simple Chinese characters, into the same sentence frames, and asked participants to read these sentences normally. Gaze duration and total time on the second word were significantly longer in the repeated condition. These results suggest that the repeated characters increased the difficulty of word processing. Moreover, the results are consistent with the predictions of serial models, which assumes that words are processed serially in reading. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
- Research Article
4
- 10.17928/jjadh.5.2_154
- Dec 25, 2020
- Journal of the Japanese Association for Digital Humanities
- Yu-Chun Wang
With the growth of digital humanities, information technologies take on more important roles in humanities research, including the study of religion. To analyze text for further processing, many text analysis tools treat a word as a unit. However, in Chinese, there are no word boundary markers. Word segmentation is required for processing Chinese texts. Although several word segmentation tools are available for modern Chinese, there is still no practical word segmentation tool for Classical Chinese, especially for Classical Chinese Buddhist literature. In this paper, we adopt unsupervised and supervised learning techniques to build Classical Chinese word segmentation approaches for processing Buddhist literature. Normalized variation of branching entropy (nVBE) is adopted for unsupervised word segmentation. Conditional random fields (CRF) are used to generate supervised models for Classical Chinese word segmentation. The performance of our word segmentation approach achieves an F-score of up to 0.9396. The experimental results show that our proposed method is effective for correctly segmenting most Classical Chinese sentences in Buddhist literature. Our word segmentation method can be a fundamental tool for further text analysis and processing research, such as word embedding, syntactic parsing, and semantic labeling.
- Research Article
162
- 10.1037/rev0000248
- Nov 1, 2020
- Psychological Review
- Xingshan Li + 1 more
In the Chinese writing system, there are no interword spaces to mark word boundaries. To understand how Chinese readers conquer this challenge, we constructed an integrated model of word processing and eye-movement control during Chinese reading (CRM). The model contains a word-processing module and an eye-movement control module. The word-processing module perceives new information within the perceptual span around a fixation. The model uses the interactive activation framework (McClelland & Rumelhart, 1981) to simulate word processing, but some new assumptions were made to address the word segmentation problem in Chinese reading. All the words supported by characters in the perceptual span are activated and they compete for a winner. When one word wins the competition, it is identified and it is simultaneously segmented from text. The eye-movement control module makes the decision regarding when and where to move the eyes using the activation information of word units and character units provided by the word-processing module. The model estimates how many characters can be processed during a fixation, and then makes a saccade to somewhere beyond this point. The model successfully simulated important findings on the relation between word processing and eye-movement control, how Chinese readers choose saccade targets, how Chinese readers segment words with ambiguous boundaries, and how Chinese readers process information with parafoveal vision during Chinese sentence reading. The current model thus provides insights on how Chinese readers address some important challenges, such as word segmentation and saccade-target selection. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
- Research Article
4
- 10.3390/info10100317
- Oct 16, 2019
- Information
- Karol Nowakowski + 2 more
Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter—a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experimental results we obtained demonstrate the high performance of our algorithm, comparable with the other best-performing models. Given its low computational cost and competitive results, we believe that the proposed approach could be extended to other languages, and possibly also to other Natural Language Processing tasks, such as speech recognition.
- Research Article
10
- 10.1177/1747021818799994
- Sep 24, 2018
- Quarterly Journal of Experimental Psychology
- Guojie Ma + 3 more
Given there are no interword spaces marking word boundaries in Chinese text, it remains unclear how information about word length influences eye movement control during the reading of Chinese text. In this research, we set up strict controls for word frequency and other word properties, to study this knowledge gap. In Experiment 1A and Experiment 1B, a between-subjects design was used. Forty-eight pairs of one- and two-character words were selected as target words in Experiment 1A, while the same amount of two- and three-character words were selected in Experiment 1B. Conversely, a within-subjects design was used in Experiment 2. Sixty sets of one-, two- and three-character words were selected as target words. The results showed that long words were skipped less often and fixated on more often than short words. Total time was shorter for shorter than for longer words but first fixation durations were longer for one- than for two-character words. Most importantly, we did not find reliable evidence to support the view that word length could modulate initial landing position and incoming saccade length in the length-matched region analyses. These findings suggest that word length influences eye movement control during reading Chinese in a way that is slightly different from that in the process of reading English.
- Research Article
1
- 10.1121/1.5036264
- Mar 1, 2018
- The Journal of the Acoustical Society of America
- Natasha L Warner + 3 more
Despite the absence of clear and reliable word boundary markers, listeners recognize words in spoken sentences. A previous study tested a spoken-word recognition model, Shortlist, by asking speakers of British English to identify real words within nonwords (McQueen, Norris & Cutler, 1994). Some words were embedded within the onsets of longer words with either a weak-strong (WS) or strong-weak (SW) stress pattern (e.g., “mess” in /dəmɛs/, the onset of “domestic,” “sack” in /sækɹɪf/, the onset of “sacrifice"), and other words were embedded without a real word onset with either a WS or SW pattern (e.g., /nəmɛs/ or /mɛstəm/ for “mess” and /sækɹək/ or /kləsæk/ for “sack”). The original study reported both competition effects (e.g. competition from “domestic” hindered recognition of “mess") and prosodic effects (e.g. the stress in “mess” facilitated segmenting it from the preceding context). This current study aimed to replicate these results in American-English. Despite the different listener population and dialect, pilot results for the current study confirm both types of effect. This data will be used to test a new American English version of the Shortlist-B model of spoken word recognition (Norris & McQueen, 2008).
- Research Article
48
- 10.1162/tacl_a_00033
- Jan 1, 2018
- Transactions of the Association for Computational Linguistics
- Yan Shao + 2 more
Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work.
- Research Article
- 10.4312/slo2.0.2016.2.131-155
- Sep 27, 2016
- Slovenščina 2.0: empirical, applied and interdisciplinary research
- Urška Vranjek Ošlak + 1 more
Vzdevki oz. uporabniška imena v računalniško posredovani komunikaciji so reprezentacija uporabnikov v spletu, zato predvidevamo, da se ti zelo potrudijo in jih oblikujejo tako, da jih ta kar najbolje predstavljajo. Uporabniška imena so pogosto zelo inovativna in izkazujejo visoko stopnjo igranja z jezikom, na kar na eni strani vpliva dejstvo, da v obravnavanih tipih računalniško posredovane komunikacije ne sme biti uporabnikov z enakim uporabniškim imenom, na drugi strani pa želja posameznika, da bi bilo njegovo uporabniško ime kar se da inovativno in unikatno. Pri oblikovanju uporabniškega imena uporabniki svojo identiteto zakrivajo oz. razkrivajo na različne načine, in sicer glede na tip računalniško posredovane komunikacije, v katerem sodelujejo. Raziskava uporabniških imen v komentarjih novinarskih prispevkov in na Twitterju je pokazala, da uporabniki, ki komentirajo spletne novice, bolj težijo k zakrivanju identitete kot uporabniki Twitterja. Po drugi strani uporabniki Twitterja v uporabniškem imenu pogosteje razkrijejo svojo identiteto, in sicer tako, da v njem uporabijo ime in/ali priimek.
- Research Article
10
- 10.1515/iral-2016-0014
- Jan 1, 2016
- International Review of Applied Linguistics in Language Teaching
- Bene Bassetti + 1 more
Abstract Interword spacing facilitates English native readers but not native readers of Chinese, a writing system that does not mark word boundaries. L1-English readers of Chinese as a Second Language (CSL) could then be facilitated if spacing is added between words in Chinese materials. However, previous studies produced inconsistent results. This study tested the hypothesis that interword spacing facilitates L1-English CSL readers. We used an online multiple-choice gap-filling task to test 12 English CSL readers and 12 Chinese natives reading a series of eight texts of suitable difficulty, written with or without interword spacing. The CSL readers read faster with interword spacing than without, while Chinese native readers were not affected. The interword spacing effect was negatively correlated with measures of reading proficiency. It is argued that interword spacing facilitates CSL readers reading materials of sufficient complexity by facilitating their lexical parsing. Pedagogical implications are discussed.
- Research Article
61
- 10.1037/a0039725
- Nov 1, 2015
- Developmental Psychology
- Katharine Graf Estes + 1 more
To learn from their environments, infants must detect structure behind pervasive variation. This presents substantial and largely untested learning challenges in early language acquisition. The current experiments address whether infants can use statistical learning mechanisms to segment words when the speech signal contains acoustic variation produced by changes in speakers' voices. In Experiment 1, 8- and 10-month-old infants listened to a continuous stream of novel words produced by 8 different female voices. The voices alternated frequently, potentially interrupting infants' detection of transitional probability patterns that mark word boundaries. Infants at both ages successfully segmented words in the speech stream. In Experiment 2, 8-month-olds demonstrated the ability to generalize their learning about the speech stream when presented with a new, acoustically distinct voice during testing. However, in Experiments 3 and 4, when the same speech stream was produced by only 2 female voices, infants failed to segment the words. The results of these experiments indicate that low acoustic variation may interfere with infants' efficiency in segmenting words from continuous speech, but that infants successfully use statistical cues to segment words in conditions of high acoustic variation. These findings contribute to our understanding of whether statistical learning mechanisms can scale up to meet the demands of natural learning environments.
- Research Article
21
- 10.1080/13506285.2014.1002554
- Feb 18, 2015
- Visual Cognition
- Guojie Ma + 2 more
In Chinese, as there are no spaces between words to mark word boundaries, readers usually do not target their eyes to the centre of the word as readers of English do. Previous studies showed that the distribution of the initial landing positions on a word (the PVL curve) peaked at the beginning of a word when there was more than one fixation; but peaked at the centre of a word if there was only one fixation on the word. Based on this phenomenon, it was argued that Chinese readers move their eyes to the beginning of a word if they cannot correctly segment words in the parafovea, but move to the centre of a word if they can. In the present study, we implemented a natural sentence reading task in Experiment 1 and a shuffled-character reading task in Experiment 2 to test whether the above PVL phenomenon was in fact caused by word segmentation. In both experiments, we found that the different PVL patterns in multiple- and single-fixation cases occurred not only for a 3-character word region but also for a 3-character nonword region. These results suggest that the different PVL curves in multiple- and single-fixation cases are likely to be due to a statistical artefact instead of parafoveal word segmentation.
- Research Article
13
- 10.1111/ejn.12008
- Oct 1, 2012
- European Journal of Neuroscience
- Antoine J Shahin + 1 more
This study examined the neurophysiological mechanisms of speech segmentation, the process of parsing the continuous speech signal into isolated words. Individuals listened to sequences of two monosyllabic words (e.g. gas source) and non-words (e.g. nas sorf). When these phrases are spoken, talkers usually produce one continuous s-sound, not two distinct s-sounds, making it unclear where one word ends and the next one begins. This ambiguity in the signal can also result in perceptual ambiguity, causing the sequence to be heard as one word (failed to segment) or two words (segmented). We compared listeners' electroencephalogram activity when they reported hearing one word or two words, and found that bursts of fronto-central alpha activity (9-14 Hz), following the onset of the physical /s/ and end of phrase, indexed speech segmentation. Left-lateralized beta activity (14-18 Hz) following the end of phrase distinguished word from non-word segmentation. A hallmark of enhanced alpha activity is that it reflects inhibition of task-irrelevant neural populations. Thus, the current results suggest that disengagement of neural processes that become irrelevant as the words unfold marks word boundaries in continuous speech, leading to segmentation. Beta activity is likely associated with unifying word representations into coherent phrases.