Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Syllable Boundaries
  • Syllable Boundaries
  • Disyllabic Words
  • Disyllabic Words
  • Polysyllabic Words
  • Polysyllabic Words

Articles published on Word Boundary

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1099 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.9734/ajrcos/2026/v19i4853
Translating Braille Patterns into Arabic Text Using a Convolutional Neural Network
  • Apr 24, 2026
  • Asian Journal of Research in Computer Science
  • Mohammed Abdalati Gerbadi + 2 more

Considering optical Braille patterns has been investigated in several studies. There is a huge number of studies which analyzed Braille patterns in different natural languages. However, the Arabic patterns have not been examined as same as the other languages. This is due to the lack of the datasets of the Arabic patterns and the shortage of researches in this area. This study utilizes YOLOv11 model as a detection tool because of its relative effectiveness and the level of accuracy as well as the staged training approach with the AdamW optimizer and Automatic Mixed Precision. For the translation of the Arabic patterns into text, post-processing steps are performed including: detecting cells vertically clustered, horizontally sorted within a line, adaptively defined word boundaries, and corrected reading order of right-to-left. In the analysis of experiments, the best findings achieved is 0.99 of all the precision, recall, and F1 scores. Moreover, the framework-level runtime indicates that the total processing time (inference + post-processing for text extraction) ranges between 29 and 82 ms per image. The proposed framework has been examined with a primary dataset of 5924 pages of images of Braille patterns of 45 classes of Arabic letters and diacritics. The yielded results show that the proposed framework is a robust approach toward effective, scalable, responsive Arabic Braille recognition (OBR) for assistive technologies to be mobile and wearable. By building the first dedicated corpus in Arabic Braille and providing an end-to-end recognition suite, this study laid the groundwork for future research and applications in the field. This research bridges the accessibility gap, so of allow sighted individuals to access content encoded in Braille.

  • Research Article
  • 10.1080/17501229.2026.2655360
Impact of speech-stream segmentation on improving listening comprehension and linguistic self-confidence in primary Arabic and English language classrooms
  • Apr 9, 2026
  • Innovation in Language Learning and Teaching
  • Abdulmajeed Alghamdi

ABSTRACT In a quasi-experimental mixed-methods study, 112 native Arabic-speaking primary school students in Saudi Arabia learning English as a foreign language were assessed using various measures to determine the potential impact of speech-stream segmentation on their Arabic and English listening comprehension and linguistic self-confidence. Pre- and post-test listening comprehension tests, questionnaires, and interviews in both languages were employed in this study. From the initial sample, 56 Arabic language students were randomly divided into two groups: an experimental group, who used speech-stream segmentation strategies in all listening activities, and a comparison group, who were taught the same lessons using traditional methods. Similarly, 56 English language students were randomly selected from the same school and divided into two groups with the same instructional design. The results demonstrated statistical significance in the groups using speech-stream segmentation to enhance their listening comprehension and linguistic self-confidence in both first and foreign languages. The results further indicate that, among speech-stream segmentation strategies, the L1 group relies more on phonological knowledge, allophonic variation, and prosodic cues, whereas the EFL group places greater emphasis on segmenting the speech stream, recognizing distributional patterns, and identifying word boundaries. Several factors impacted the effectiveness of speech-stream segmentation in improving listening comprehension and linguistic self-confidence; however, these factors differed between the two languages. Key areas were outlined for future research on speech-stream segmentation and listening comprehension.

  • Research Article
  • 10.17507/jltr.1702.34
Coronal-Triggered Voicing Assimilation in Najdi Arabic: A Phonetic and Phonological Analysis
  • Mar 2, 2026
  • Journal of Language Teaching and Research
  • Mohammad Aljutaily

This paper investigates voicing assimilation across morpheme and word boundaries in Najdi Arabic (NA) from both phonetic and phonological perspectives. Although voicing assimilation has been widely documented cross-linguistically, its realization in NA remains underexplored. Using autosegmental theory and feature geometry, the study examines how the alveolar stop /t/ in the proclitic /mit-/ behaves when followed by different consonants and whether assimilation in NA is categorical or gradient. Acoustic data were elicited from eight native male speakers, yielding 288 tokens produced in controlled elicitation tasks. Vowel duration, F1, and F2 were obtained in Praat and statistically compared across voiceless (VL) and voiced (VD) contexts. The results reveal systematic regressive assimilation when /t/ precedes a coronal obstruent, producing a fully identical geminate through delinking and reassociation of the C-place node. In VD contexts, vowels preceding assimilated segments were longer, exhibited lower F1 and higher F2 values, and showed continuous voicing, providing clear acoustic evidence of assimilation. Assimilation was consistently blocked before non-coronal and sonorant consonants, confirming feature-geometry predictions. The same mechanism operated across both morpheme and word boundaries, indicating a unified assimilation rule in NA. Overall, the findings show that voicing assimilation in NA is categorically implemented yet phonetically grounded, situating the dialect within the broader Arabic typology.

  • Research Article
  • 10.3758/s13423-026-02884-w
Reading fluency and word segmentation agreement modulate the benefits of word boundary cues for older readers in traditional Chinese.
  • Mar 1, 2026
  • Psychonomic bulletin & review
  • Yiu-Kei Tsang + 2 more

Despite having extensive reading experience, older readers suffer from declines in visual acuity and processing speed, which may undermine their ability to segment words in unspaced scripts like Chinese. While using text colors to highlight word boundaries can aid reading and eye movement control for readers of simplified Chinese, it remains unclear whether the benefits extend to older readers, especially those who read the visually more complex traditional Chinese script. This study investigated this question in three conditions: a baseline monocolor condition, a word segmentation condition where words were marked by alternating text colors, and a nonword segmentation condition. By tracking the eye movements of 76 older readers, we found a robust interference effect from nonword segmentation across all reading and oculomotor measures. In contrast, benefits of word segmentation cues were strikingly specific, emerging only for readers with lower vocabulary knowledge and for words with clear, unambiguous boundaries. This reveals that the utility of explicit word boundary cues depends on a dynamic interplay between visual processing and vocabulary knowledge. These findings have important implications. Theoretically, they underscore that models of reading should account for word boundary ambiguity and readers' experience. Practically, the development of assistive reading technologies needs to be tailored to the needs of less proficient readers, who benefit most from external support.

  • Research Article
  • 10.2989/16073614.2025.2602487
A linguistic analysis of abbreviations, acronyms and ‘acreviations’ in African languages
  • Mar 1, 2026
  • Southern African Linguistics and Applied Language Studies
  • Ximbani Eric Mabaso

This article analyses the state of abbreviations and acronyms in African languages, especially Xitsonga, in order to determine their formation, structure, punctuation and naming patterns, with the aim of recommending standardisation strategies. The data was collected from various oral and written sources. The study finds that various types of shortenings manifest in human phenomena (names, social positions and relations), names of countries, months, weekdays, holidays, organisations, etc. There are long shortened texts (abstracts, summaries) and short texts (sentence, phrase, word, morpheme, syllable). This article focuses on short texts, which fall into four major categories: compression, acronym, abbreviation and ‘acreviation’/ ‘abbracronym’. Examples of these forms are za < zela (ultimately), Huriri/HRR < Huvo ya Rixaka ya Ririmi (National Language Body) and NSFAS > En-es-FAS. An abbreviation is characterised by spelling the word letter by letter (SABC), or using it only in writing (njl/njll) but uttering its full form when reading (njalo/njalonjalo, etc.). Compression and acronyms are characterised by the word’s pronounceability in normal syllables and across word boundaries. ‘Acreviation’ is derived from acronym+abbreviation while ‘abbracronym’ was formed from abbreviation+acronym. Each form further reveals different structural and punctuation patterns according to the rules of a specific language.

  • Research Article
  • 10.1177/10298649261419805
Watch this space: Primitive visual cues enhance sight-reading accuracy
  • Feb 25, 2026
  • Musicae Scientiae
  • David Duncan + 2 more

The ability to read and perform from notation is a fundamental skill in music performance. While for many musicians, staff notation is both transparent and flexible, a medium that can be used fluently and imaginatively, it is frequently experienced as complex and difficult – as a form of communication it is nobody’s first language. Stenberg and Cross showed that it is possible to make musical notation easier to read at sight; adding white spaces to simple two-part pieces led to improved sight-reading performance compared with conventional staff notation. Separating units of music visually may assist a performer to process a written score, a finding that parallels the results of research into the effects of interword separation in linguistic text, where this has been found to help readers identify word boundaries and process written information, particularly when reading in a second language. The present study extends these findings, using a selection of piano pieces varying in complexity in an adaptive paradigm. Twenty-five pianists with a range of levels of expertise in sight-reading performed at sight from both conventional staff notation and notation that had been modified by adding white spaces to denote musical groups; performances were coded for pitch, rhythm, and meter errors. Results suggest that the modified staff notation reduces error counts by around 19% when performers are nearing the threshold of their sight-reading ability, with a strong correlation between the difficulty of the task and the effect of the added visual cues.

  • Research Article
  • 10.1097/aud.0000000000001799
Mandarin-Speaking Preschoolers With Cochlear Implants Can Use Duration and Pitch to Mark Prosodic Boundaries.
  • Feb 20, 2026
  • Ear and hearing
  • Feng Xu + 3 more

This study asked if Mandarin-speaking preschoolers with cochlear implants (CIs) can produce distinct prosodic cues for word boundary marking to disambiguate compounds from lists, and whether their productions are similar to those of their typically hearing (TH) peers. Forty-two 4 to 6-year-old Mandarin-speaking preschoolers with CIs and 64 TH peers participated in an elicited production experiment. Preschoolers produced compounds and lists in carrier sentences. Syllable duration, pause insertion, pause duration, and tonal range were acoustically analyzed. Overall, preschoolers with CIs can produce durational and pitch cues to disambiguate compounds and lists, but their syllable durations and some tonal ranges in lists differ from those of their TH peers, with more pauses and longer pause durations as well. These findings suggest that, despite using CIs, preschoolers can produce prosodic cues for postlexical meaning, but their productions are not yet fully like those of their TH peers.

  • Research Article
  • 10.3758/s13423-025-02785-4
Word length does not modulate the transposed-word effect in Chinese reading: Testing the OB1-Reader.
  • Feb 19, 2026
  • Psychonomic bulletin & review
  • Jingxin Wang + 6 more

Grammaticality decision studies show that word order is processed flexibly during reading, as participants often misread sentences containing transposed words as if they were correctly ordered (e.g., Mirault et al., Psychological Science, 29 (12), 1922-1929. 2018). The OB1-Reader model (Snell et al., Psychological Review, 125 (6), 969-984, 2018) explains this effect as arising from positional uncertainty during parallel word recognition, proposing that low-level visual cues like word length help constrain word positions, so that transpositions are easier to detect when words differ in length. In Chinese, the absence of this effect has been attributed to limited variability in word length and lack of explicit word boundaries. We therefore investigated whether marking word boundaries using interword spaces (Experiment 1) or alternating text-color (Experiment 2) would elicit a word-length effect on transposed-word detection in Chinese. Both experiments produced robust transposed-word effects, some indication that explicit boundary cues improve transposed-word detection, but with no evidence that they elicit a word-length effect on transposed-word detection. Together with converging evidence from French, these findings suggest both that boundary cues do not reliably reduce positional uncertainty in Chinese, and that low-level visual cues like word length have limited influence on positional processing in either alphabetic scripts or Chinese.

  • Research Article
  • 10.70728/edu.v02.i03.009
ASSIMILATION IN ENGLISH AND UZBEK PHONETICS: A COMPARATIVE STUDY
  • Feb 17, 2026
  • Advances in Science and Education
  • Khasanov Elyorjon Odiljonovich

This study investigates assimilation in English and Uzbek phonetics from a comparative perspective. Focusing on consonantal assimilation, it examines articulatory and phonological patterns observed in both languages. English assimilation is closely linked to connected speech and stylistic variation, whereas Uzbek assimilation predominantly occurs within word boundaries and morphological structures. The analysis highlights shared phonetic principles alongside language-specific realisations and demonstrates the relevance of assimilation for pronunciation teaching and applied phonetics.

  • Research Article
  • 10.3390/bs16020185
The Influence of Contextual Predictability on Word Segmentation in Chinese Reading: An Eye-Tracking Study.
  • Jan 27, 2026
  • Behavioral sciences (Basel, Switzerland)
  • Mengchuan Song + 3 more

Word segmentation is a fundamental component of lexical processing, and Chinese reading-lacking inter-word spacing-requires readers to identify word boundaries based on prior experience. Previous studies have shown that contextual predictability facilitates lexical identification in Chinese reading; however, its influence on word segmentation remains unclear. This study used eye-tracking to examine the relationship between contextual predictability and readers' segmentation preferences during Chinese sentence reading. Overlapping ambiguous three-character strings (e.g., ) were used as the region of interest (ROI), and a 2 (segmentation type: AB-C (e.g., /) vs. A-BC (e.g., /)) × 2 (contextual predictability: high vs. low) within-subjects design was adopted. A total of 76 native Chinese speakers completed the task. The results showed that contextual predictability had a significant effect on skipping probability: Highly predictable target character strings were skipped more often than low-predictability words. However, contextual predictability did not influence any eye-movement measure. In contrast, segmentation type produced consistent effects across all measures, with shorter reading times for AB-C than for A-BC, indicating a stable preference for two-character segmentation. More importantly, no interaction emerged between contextual predictability and segmentation type, and Bayesian model comparison further supported this conclusion. These findings suggest that Chinese reading involves a robust preference for AB-C segmentation and that contextual predictability and word segmentation operate as independent processes, with predictability exerting minimal influence on word segmentation during reading. This result supports the Chinese Reading Model (CRM).

  • Research Article
  • 10.3758/s13428-025-02935-5
A database of overlapping ambiguous strings in Chinese reading.
  • Jan 26, 2026
  • Behavior research methods
  • Linjieqiong Huang + 2 more

In the absence of inter-word spaces, Chinese text sometimes presents word boundary ambiguity. One common case is the overlapping ambiguous string (OAS), a three-character string (ABC) where the middle character can form distinct words with both the character to its left (AB) and the character to its right (BC), creating segmentation ambiguity between AB-C and A-BC. This structure makes OASs a valuable tool for investigating the cognitive mechanisms of Chinese word segmentation. We introduce a comprehensive OAS database consisting of 952,497 OASs, each with 43 types of linguistic information at the character, word, and OAS levels. To illustrate how to use the database, we conducted an eye-tracking reading experiment manipulating whether the first character of the OAS (i.e., character A) could stand alone in sentences. Results showed that when character A could not stand alone, readers were more likely to group it with the next character B, leading to an AB-C segmentation. These findings validate the utility of the OAS database in understanding word segmentation during Chinese reading. The potential applications of the database in artificial intelligence, education, and writing system reform are discussed.

  • Research Article
  • 10.20965/jaciii.2026.p0015
Handwritten Character String Recognition Using a String Recognition Transformer
  • Jan 20, 2026
  • Journal of Advanced Computational Intelligence and Intelligent Informatics
  • Shunya Rakuka + 2 more

Improving the accuracy of handwritten character string recognition allows handwritten documents to be converted into digital text. This facilitates camera-based text input, enabling robotic process automation to manage documentation tasks. Although this field has seen significant progress, recognizing handwritten Japanese remains particularly challenging due to the difficulty of character segmentation, the wide variety of character types, and the absence of clear word boundaries. These factors make unconstrained handwritten Japanese string recognition particularly difficult for conventional approaches. Moreover, transformer-based models typically require large amounts of annotated training data. This study proposes and investigates a new String Recognition Transformer (SRT) model capable of recognizing unconstrained handwritten Japanese character strings without relying on explicit character segmentation or a large number of training images. The SRT model integrates a convolutional neural network backbone for robust local feature extraction, a Transformer encoder-decoder architecture, and a sliding window strategy that generates overlapping patches. Comparative experiments show that our method achieved a character error rate (CER) of 0.067, significantly outperforming convolutional recurrent neural network, transformer-based optical character recognition, and handwritten text recognition with Vision Transformer which achieved CERs of 0.664, 0.165, and 0.106, respectively, thereby confirming the effectiveness and robustness of the approach.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.neuron.2025.10.011
Human cortical dynamics of auditory word form encoding.
  • Jan 1, 2026
  • Neuron
  • Yizhen Zhang + 4 more

We perceive continuous speech as a series of discrete words, despite the lack of clear acoustic boundaries. The superior temporal gyrus (STG) encodes phonetic elements like consonants and vowels, but it is unclear how whole words are encoded. Using high-density cortical recordings and spoken narratives, we investigated how the human brain represents auditory word forms. STG activity exhibits a distinctive reset at word boundaries, marked by a sharp drop in cortical activity. Between resets, STG encodes acoustic-phonetic, prosodic, and lexical features, supporting integration of phonological features into coherent word forms. This process tracks the relative elapsed time within words, independent of absolute duration, providing a flexible encoding of variable word lengths. Similar dynamics were found in deeper layers of a self-supervised artificial speech network. Finally, a bistable word perception task revealed trial-by-trial STG responses to perceived word boundaries. Together, these findings support a new dynamical model of auditory word forms.

  • Research Article
  • 10.1016/j.aej.2025.11.053
Towards a word-granularity paradigm for Chinese event detection: Targeting long-tail challenges in syntax and semantics
  • Jan 1, 2026
  • Alexandria Engineering Journal
  • Yuewei Zhou + 5 more

Towards a word-granularity paradigm for Chinese event detection: Targeting long-tail challenges in syntax and semantics

  • Research Article
  • 10.1017/langcog.2026.10064
The interaction of language and music: a psycholinguistic approach for a shared pitch mechanism (?)
  • Jan 1, 2026
  • Language and Cognition
  • Aris Kargakis + 2 more

Abstract Over the last decades, there has been an increasing interest in the cognitive interaction between language and music. Previous research has focused on investigating potential underlying processes shared by the two domains. While some studies do not support such a connection when examining linguistic and music pitch, there seems to be a consensus concerning the existence of structural rule parallels, essential to the linguistic and musical adequacy. The present study focuses on the role of a non-linguistic acoustic cue, such as a high/neutral or low music pitch note, to investigate whether it affects the phrase word boundaries on garden-path sentences in Greek, leading to the elevation of garden-path effects, similarly to what has been suggested for rising intonation. Through a self-paced reading-listening experiment where word segments are accompanied by music pitch notes, our results showed significant ambiguity resolution effects for both high and low music pitch. We interpret the obtained data as an indication of an interaction between language and music, where general (random) sound signals may facilitate linguistic processing.

  • Research Article
  • 10.25022/jkler.2025.26.191
고립어권(중국·베트남) 한국어 학습자의 한국어 음운변동 지각 습득 양상과 학습자 변인에 관한 연구
  • Dec 30, 2025
  • The Research Society for the Korean Language Education
  • Seon Mi Lee

This study aims to examine how Chinese and Vietnamese learners acquire Korean phonological rules by comparing their perceptual patterns of Korean phonological processes across different proficiency levels. The analysis shows that liaison and lateralization are relatively easy for both groups, as they reach perception levels comparable to native speakers by the intermediate level. In contrast, aspiration and tensification after obstruents consistently yield low accuracy regardless of proficiency, indicating high perceptual difficulty. Vietnamese learners demonstrated faster acquisition of nasalization of obstruents and liquid nasalization than Chinese learners, while palatalization displayed divergent developmental trajectories depending on learners’ language backgrounds and proficiency levels. Error analysis revealed that Chinese learners frequently substituted various nasals with /n/ and perceptually neutralized aspirated or tense consonants into plain stops, whereas Vietnamese learners tended to retain underlying forms without applying phonological processes or failed to accurately process coda deletion and liaison in connected speech. Both groups showed difficulties segmenting word boundaries appropriately in liaison and palatalization contexts. Additionally, analyses of learning-related variables indicated no significant relationship between perceptual accuracy and factors suchas Korean exposure time, L1 use, or integrative and instrumental motivation, suggesting that phonological process perception relies more on low-level phonetic and phonological cue processing than on exposure or affective variables. By integrating analyses of acquisition patterns, error types, and learner variables, this study empirically demonstrates cross-linguistic differences in the perceptual development of Korean phonological processes and highlights the need for perception-based pronunciation instruction tailored to learners’ linguistic backgrounds.

  • Research Article
  • 10.1007/s42452-025-08087-7
Developing an audio search engine for Amharic speech web resources
  • Dec 24, 2025
  • Discover Applied Sciences
  • Arega Hassen + 2 more

Abstract While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amharic. As a morphologically complex language with unique linguistic characteristics, Amharic presents significant challenges for information retrieval, particularly for speech content. Although Amharic web resources are expanding across text, speech, and video formats, speech retrieval demands specialized solutions due to three key challenges: (1) the absence of explicit word boundaries requiring accurate automatic speech recognition, (2) lack of visual context compared to video content, and (3) compounding effects of Amharic's rich morphology on transcription accuracy. These challenges are exacerbated by the proliferation of online radio broadcasts, speech reports, and news content in Amharic. This study presents a dedicated Audio Search Engine for Amharic speech web resources, addressing these challenges through four key innovations: (1) an enhanced web crawler optimized for Amharic speech content, (2) robust speech transcription pipelines, (3) efficient indexing of transcribed content, and (4) language-specific query preprocessing components. Our system leverages open-source technologies, including JSpider for crawling, Sphinx for speech recognition, and Datafari for indexing and retrieval, creating an integrated solution tailored to Amharic's linguistic characteristics. Evaluation results demonstrate the system's effectiveness, achieving 80% precision in top-10 results and 92% recall compared to baseline retrieval methods. These promising results highlight our solution's capability to handle Amharic's unique challenges while providing practical retrieval performance. The study contributes both a technical framework for Amharic speech search and insights applicable to other resource-constrained languages facing similar retrieval challenges.

  • Research Article
  • 10.1371/journal.pone.0336942.r008
WBA: Word Boundary Attention for Chinese Named Entity Recognition
  • Dec 23, 2025
  • PLOS One
  • Zhongguo Xu + 5 more

Chinese words often exhibit a parallel structural relationship within sentences, while individual characters are sequentially connected. To capture this structural distinction, we extract the head and tail positions of characters within words and incorporate them into a relative positional encoding scheme. Building upon this design, we introduce Word Boundary Attention (WBA), a mechanism that assigns dynamic attention weights to characters and enhances their representations with contextual information derived from the word lattice. By explicitly modeling word boundaries, WBA effectively suppresses noise, improves word recognition, and leverages richer lexicon-based context during training. Extensive experiments across multiple datasets demonstrate that WBA consistently outperforms existing approaches, achieving, for instance, a 2.51% improvement over the base model on the Weibo dataset with YJ lexicon encoding. Furthermore, visualizations of the learned attention weights reveal the interactive relationships between words and characters, providing interpretable insights into the process of word discovery. The source code of the proposed method is publicly available at https://github.com/na978292231/WBA/tree/main/WBA4NER-main.

  • Research Article
  • Cite Count Icon 1
  • 10.1371/journal.pone.0336942
WBA: Word Boundary Attention for Chinese Named Entity Recognition.
  • Dec 23, 2025
  • PloS one
  • Zhongguo Xu

Chinese words often exhibit a parallel structural relationship within sentences, while individual characters are sequentially connected. To capture this structural distinction, we extract the head and tail positions of characters within words and incorporate them into a relative positional encoding scheme. Building upon this design, we introduce Word Boundary Attention (WBA), a mechanism that assigns dynamic attention weights to characters and enhances their representations with contextual information derived from the word lattice. By explicitly modeling word boundaries, WBA effectively suppresses noise, improves word recognition, and leverages richer lexicon-based context during training. Extensive experiments across multiple datasets demonstrate that WBA consistently outperforms existing approaches, achieving, for instance, a 2.51% improvement over the base model on the Weibo dataset with YJ lexicon encoding. Furthermore, visualizations of the learned attention weights reveal the interactive relationships between words and characters, providing interpretable insights into the process of word discovery. The source code of the proposed method is publicly available at https://github.com/na978292231/WBA/tree/main/WBA4NER-main.

  • Research Article
  • 10.21817/indjcse/2025/v16i6/251606014
DEPENDENCY TREE PARSING USING TRANSFORMER-BASED LANGUAGE REPRESENTATIONS
  • Dec 20, 2025
  • Indian Journal of Computer Science and Engineering
  • Nwe Nwe Win + 2 more

Dependency parsing in Myanmar face many challenges due to agglutinative morphology, flexible word order, and lack of explicit word boundaries. This paper addresses these challenges by investigating the performance of Myanmar dependency parsing based three pretrained-transformer models: XLM-RoBERTa-Large, XLM-RoBERTa-base-Longformer-4096, and language-specific model, MyanBERTa. The proposed methodology utilizes a parameter-efficient model with adapter layers and a biaffine parsing mechanism by optimizing multi-task learning objective across six linguistic prediction tasks. The experimental results demonstrated that XLM-RoBERTa-Large achieves the highest Labeled Attachment Score (LAS) and Unlabeled Attachment Score (UAS) on both development set and test set. Despite XLM-RoBERTa-base-Longformer-4096 model can extended its input capacity and MyanBERTa is pretrained on a language capacity, their performance drop compared to XLM-RoBERTa-Large. These findings suggest that the scale and architectural strength of models are more critical for high-performance dependency parsing than models with an extended input capacity or those trained on a language-specific corpus.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers