Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Words In Sentence
  • Words In Sentence
  • Word Sequences
  • Word Sequences
  • Word Strings
  • Word Strings
  • Lexical Information
  • Lexical Information
  • Compound Words
  • Compound Words

Articles published on Word Boundary Information

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
33 Search results
Sort by
Recency
  • Research Article
  • Cite Count Icon 8
  • 10.1109/tnnls.2025.3528416
Hierarchical Label-Enhanced Contrastive Learning for Chinese NER.
  • Jun 1, 2025
  • IEEE transactions on neural networks and learning systems
  • Chengyu Wang + 6 more

Recently, character-word lattice structures have achieved promising results for Chinese named entity recognition (NER), reducing word segmentation errors and increasing word boundary information for character sequences. However, constructing the lattice structure is complex and time-consuming, thus these lattice-based models usually suffer from low inference speed. Moreover, the quality of the lexicon affects the accuracy of the NER model. Since noise words can potentially confuse NER, limited coverage of the lexicon can cause lattice-based models to degenerate into partial character-based models. In this article, we propose a hierarchical label-enhanced contrastive learning (HLCL) method for Chinese NER. Instead of relying on the lattice structure, HLCL offers an alternative solution to robustly integrate entity boundary and type information with the help of both labels semantic and contrastive learning. HLCL is empowered by two techniques: 1) sentence-level contrastive learning (SCL) to model global mutual information between two different modalities (e.g., labels and sentences) and 2) token-level contrastive learning (TCL) to close the gap between representations of different characters (e.g., label-enhanced characters and original characters), resulting in local mutual information. With the well-designed contrastive learning scheme and the concise model during inference, HLCL can fully leverage the transferable label semantic and has a superb speed of inference. Experiments on four Chinese NER datasets show that HLCL obtains excellent efficiency as well as performance compared with existing lattice-based approaches.

  • Research Article
  • 10.47852/bonviewjdsis42024432
Low-Resource Chinese Named Entity Recognition via CNN-based Multitask Learning
  • Dec 17, 2024
  • Journal of Data Science and Intelligent Systems
  • Tao Wu + 5 more

Named entity recognition (NER) is a fundamental subtask for information extraction that aims to locate and classify named entities in unstructured text into predefined categories. Recently, large-scale language models (LLMs) have achieved SOTA performance on a variety of natural language processing tasks. However, because NER is a sequence labeling task in nature while LLMs is a text-generation model, the performance of LLMs on NER is still significantly below supervised baselines, and NER remains a difficult task. Meanwhile, the word boundary and semantic information of Chinese words are usually quite vague, as words contained in Chinese texts are not separated by spaces. Thus, the NER task still requires supervised learning paradigm and heavily relies on large amounts of labeled data, such as entity type and boundary information. However, the cost of labeling data can be prohibitively large, and the purely supervised approaches usually suffer from poor generalization capability. In this article, we propose a multitask learning-based bidirectional iterated dilated convolution model, BCNN-CWS, for low-resource NER via leveraging word boundary information of Chinese word segmentation (CWS) task. Specifically, to efficiently recognize named entities, an iterated dilated convolutional model with a limited number of layers is implemented. In addition, a bidirectional causal convolution mechanism is presented for contextual information extraction. Results of extensive experiments on public Chinese datasets demonstrate that BCNN-CWS achieves superior performance over state-of-the-art models, and it yields up to about 50% speed improvement over existing methods. It is worth noting that BCNN-CWS can be further improved by combining with a pretrained model. Received: 25 Spetember 2024 | Revised: 4 November 2024 | Accepted: 28 November 2024 Conflicts of Interest The authors declare that they have no conflicts of interest to this work. Data Availability Statement The data that support the findings of this study are openly available in GitLab at https://github.com/jiangfeng13/BCNN-CWS Author Contribution Statement Tao Wu: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Visualization, Supervision. Xinwen Cao: Resources, Data curation. Feng Jiang: Software, Validation, Formal analysis, Investigation, Writing – original draft. Canyixing Cui: Data curation, Writing -review & editing. Xuehao Li: Resources. Xingping Xian: Supervision, Project administration, Funding acquisition.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.learninstruc.2024.102034
Does word boundary information facilitate Chinese sentence reading in children as beginning readers?
  • Oct 18, 2024
  • Learning and Instruction
  • Weiyan Liao + 1 more

Does word boundary information facilitate Chinese sentence reading in children as beginning readers?

  • Research Article
  • 10.1109/access.2024.3507382
Enhancing Sindhi Word Segmentation Using Subword Representation Learning and Position-Aware Self-Attention
  • Jan 1, 2024
  • IEEE Access
  • Wazir Ali + 5 more

Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features. However, these methods have limitations, such as difficulty handling out-of-vocabulary words, limited robustness for other languages, and inefficiency with large amounts of noisy or raw text. Neural network-based models, in contrast, can automatically capture word boundary information without requiring prior knowledge. In this paper, we propose a Subword-Guided Neural Word Segmenter (SGNWS) that addresses word segmentation as a sequence labeling task. The SGNWS model incorporates subword representation learning through a bidirectional long short-term memory encoder, position-aware self-attention, and a conditional random field. Our empirical results demonstrate that the SGNWS model achieves state-of-the-art performance in Sindhi word segmentation on six datasets.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.specom.2023.102970
Correction of whitespace and word segmentation in noisy Pashto text using CRF
  • Aug 14, 2023
  • Speech Communication
  • Ijazul Haq + 3 more

Correction of whitespace and word segmentation in noisy Pashto text using CRF

  • Open Access Icon
  • Research Article
  • Cite Count Icon 8
  • 10.1145/3604811
A Joint Entity and Relation Extraction Model based on Efficient Sampling and Explicit Interaction
  • Aug 11, 2023
  • ACM Transactions on Intelligent Systems and Technology
  • Qibin Li + 4 more

Joint entity and relation extraction (RE) construct a framework for unifying entity recognition and relationship extraction, and the approach can exploit the dependencies between the two tasks to improve the performance of the task. However, the existing tasks still have the following two problems. First, when the model extracts entity information, the boundary is blurred. Secondly, there are mostly implicit interactions between modules, that is, the interactive information is hidden inside the model, and the implicit interactions are often insufficient in the degree of interaction and lack of interpretability. To this end, this study proposes a joint entity and relation extraction model (ESEI) based on E fficient S ampling and E xplicit I nteraction. We innovatively divide negative samples into sentences based on whether they overlap with positive samples, which improves the model’s ability to extract entity word boundary information by controlling the sampling ratio. In order to increase the explicit interaction ability between the models, we introduce a heterogeneous graph neural network (GNN) into the model, which will serve as a bridge linking the entity recognition module and the relation extraction module, and enhance the interaction between the modules through information transfer. Our method substantially improves the model’s discriminative power on entity extraction tasks and enhances the interaction between relation extraction tasks and entity extraction tasks. Experiments show that the method is effective, we validate our method on four datasets, and for joint entity and relation extraction, our model improves the F1 score on multiple datasets.

  • Research Article
  • Cite Count Icon 9
  • 10.1145/3603626
Adversarial Multi-task Learning for Efficient Chinese Named Entity Recognition
  • Jul 20, 2023
  • ACM Transactions on Asian and Low-Resource Language Information Processing
  • Yibo Yan + 4 more

Named entity recognition (NER) is a fundamental task for information extraction applications. NER is challenging because of semantic ambiguities in academic literature, especially for non-Latin languages. Besides word semantic information, recognizing Chinese named entities needs to consider word boundary information, as words contained in Chinese texts are not separated with spaces. Leveraging word boundary information could help to determine entity boundaries and thus improve entity recognition performance. In this article, we propose to combine word boundary information and semantic information for named entity recognition based on multi-task adversarial learning. Specifically, we learn commonly shared boundary information of entities from multiple kinds of tasks, including Chinese word segmentation (CWS), part-of-speech (POS) tagging, and entity recognition, with adversarial learning. We learn task-specific semantic information of words from these tasks and combine the learned boundary information with the semantic information to improve entity recognition with multi-task learning. We then propose a compression method based on improved clustering to accelerate the proposed model. We conduct extensive experiments on four public benchmark datasets and two private datasets, compared with state-of-the-art baseline models, and the experimental results demonstrate that our model achieves considerable performance improvements on various evaluation datasets.

  • Research Article
  • Cite Count Icon 30
  • 10.1109/tnnls.2021.3114378
Enhancing Chinese Character Representation With Lattice-Aligned Attention.
  • Jul 1, 2023
  • IEEE Transactions on Neural Networks and Learning Systems
  • Shan Zhao + 5 more

Word-character lattice models have been proved to be effective for some Chinese natural language processing (NLP) tasks, in which word boundary information is fused into character sequences. However, due to the inherently unidirectional sequential nature, prior approaches have only learned sequential interactions of character-word instances but fail to capture fine-grained correlations in word-character spaces. In this article, we propose a lattice-aligned attention network (LAN) that aims to model dense interactions over word-character lattice structure for enhancing character representations. By carefully combining cross-lattice module, gated word-character semantic fusion unit, and self-lattice attention module, the network can explicitly capture fine-grained correlations across different spaces (e.g., word-to-character and character-to-character), thus significantly improving model performance. Experimental results on three Chinese NLP benchmark tasks demonstrate that LAN obtains state-of-the-art results compared to several competitive approaches.

  • Research Article
  • Cite Count Icon 8
  • 10.1145/3570328
Deep Neural Network with Embedding Fusion for Chinese Named Entity Recognition
  • Mar 23, 2023
  • ACM Transactions on Asian and Low-Resource Language Information Processing
  • Kaifang Long + 7 more

Chinese Named Entity Recognition (NER) is an essential task in natural language processing, and its performance directly impacts the downstream tasks. The main challenges in Chinese NER are the high dependence of named entities on context and the lack of word boundary information. Therefore, how to integrate relevant knowledge into the corresponding entity has become the primary task for Chinese NER. Both the lattice LSTM model and the WC-LSTM model did not make excellent use of contextual information. Additionally, the lattice LSTM model had a complex structure and did not exploit the word information well. To address the preceding problems, we propose a Chinese NER method based on the deep neural network with multiple ways of embedding fusion. First, we use a convolutional neural network to combine the contextual information of the input sequence and apply a self-attention mechanism to integrate lexicon knowledge, compensating for the lack of word boundaries. The word feature, context feature, bigram feature, and bigram context feature are obtained for each character. Second, four different features are used to fuse information at the embedding layer. As a result, four different word embeddings are obtained through cascading. Last, the fused feature information is input to the encoding and decoding layer. Experiments on three datasets show that our model can effectively improve the performance of Chinese NER.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.3389/fpsyg.2023.783960
Eye movements of second language learners when reading spaced and unspaced Chinese texts
  • Mar 13, 2023
  • Frontiers in Psychology
  • Yaqiong Cui

Unlike English, Chinese does not have interword spacing in written texts, which poses difficulties for Chinese-as-a-second-language (CSL) learners’ identification of word boundaries and affects their reading comprehension and vocabulary acquisition. The eye-movement literature has suggested that interword spacing is important in alphabetic languages; examining languages that lack interword spaces such as Chinese, thus, may help to inform theoretical accounts of eye-movement control and word identification during reading. Research investigating the interword spacing effect in reading Chinese showed that adding spacing facilitated CSL learners’ reading comprehension and speed as well as vocabulary learning. However, the bulk of this research mainly looked at the learning outcomes (off-line measures), with few studies focusing on L2 learners’ reading processes. Building on this background, this study seeks to provide a descriptive perspective of the eye movements of CSL learners. In this study, 24 CSL learners with intermediate Chinese proficiency were recruited as the experimental group, and 20 Chinese native speakers were recruited as the control group. The EyeLink 1,000 eye tracker was used to record their reading of four segmentation conditions of Chinese texts, namely, no space condition, word-spaced condition, non-word-spaced condition, and pinyin-spaced condition. Results show that: (1) CSL learners with intermediate Chinese proficiency generally spent less time reading Chinese texts with spaces between words, and they showed more gazes and regressions when reading texts without spaces; (2) Non-word-spaced texts and Pinyin-spaced texts interfere with CSL learners’ reading process; and (3) Intermediate CSL learners show consistent eye movement patterns in the normal no-space condition and word-spaced condition. I conclude that word boundary information can effectively guide CSL learners’ eye movement behaviors and eye saccade planning, thus improving reading efficiency.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1038/s41598-022-25759-1
Effects of syllable boundaries in Tibetan reading
  • Jan 6, 2023
  • Scientific Reports
  • Danhui Wang + 7 more

Interword spaces exist in the texts of many languages that use alphabetic writing systems. In most cases, interword spaces, as a kind of word boundary information, play an important role in the reading process of readers. Tibetan also uses alphabetic writing, its text has no spaces between words as word boundary markers. Instead, there are intersyllable tshegs (“”), which are superscript dots. Interword spaces play an important role in reading as word boundary information. Therefore, it is interesting to investigate the role of tshegs and what effect replacing tshegs with spaces will have on Tibetan reading. To answer these questions, Experiment 1 was conducted in which 72 Tibetan undergraduates read three-syllable-boundary conditions (normal, spaced, and untsheged). However, in Experiment 1, because we performed the experimental operations of deleting tshegs and replacing tshegs, the spatial information distribution of Tibetan sentences under different operating conditions was different, which may have a certain potential impact on the experimental results. To rule out the underlying confounding factor, in Experiment 2, 58 undergraduates read sentences for both untsheged and alternating-color conditions. Overall, the global and local analyses revealed that tshegs, spaces, and alternating-color markers as syllable boundaries can help readers segment syllables in Tibetan reading. In Tibetan reading, both spaces and tshegs are effective visual syllable segmentation cues, and spaces are more effective visual syllable segmentation cues than tshegs.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.16910/jemr.14.1.6
Manipulating Interword and Interletter Spacing in Cursive Script: An Eye Movements Investigation of Reading Persian
  • May 31, 2021
  • Journal of Eye Movement Research
  • Ehab W Hermena

Persian is an Indo-Iranian language that features a derivation of Arabic cursive script,where most letters within words are connectable to adjacent letters with ligatures. Twoexperiments are reported where the properties of Persian script were utilized to investigatethe effects of reducing interword spacing and increasing the interletter distance (ligature)within a word. Experiment 1 revealed that decreasing interword spacing while extendinginterletter ligature by the same amount was detrimental to reading speed. Experiment 2largely replicated these findings. The experiments show that providing the readers withinaccurate word boundary information is detrimental to reading rate. This was achieved byreducing the interword space that follows letters that do not connect to the next letter inExperiment 1, and replacing the interword space with ligature that connected the words inExperiment 2. In both experiments, readers were able to comprehend the text read, despitethe considerable costs to reading rates in the experimental conditions.

  • Research Article
  • Cite Count Icon 27
  • 10.1609/aaai.v35i16.17706
Dynamic Modeling Cross- and Self-Lattice Attention Network for Chinese NER
  • May 18, 2021
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Shan Zhao + 4 more

Word-character lattice models have been proved to be effective for Chinese named entity recognition (NER), in which word boundary information is fused into character sequences for enhancing character representations. However, prior approaches have only used simple methods such as feature concatenation or position encoding to integrate word-character lattice information, but fail to capture fine-grained correlations in word-character spaces. In this paper, we propose DCSAN, a Dynamic Cross- and Self-lattice Attention Network that aims to model dense interactions over word-character lattice structure for Chinese NER. By carefully combining cross-lattice and self-lattice attention modules with gated word-character semantic fusion unit, the network can explicitly capture fine-grained correlations across different spaces (e.g., word-to-character and character-to-character), thus significantly improving model performance. Experiments on four Chinese NER datasets show that DCSAN obtains stateof-the-art results as well as efficiency compared to several competitive approaches.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/s11145-021-10164-3
Effect of alternating-color words on oral reading in grades 2–5 Chinese children: evidence from eye movements
  • May 6, 2021
  • Reading and Writing
  • Ziming Song + 3 more

There is no obvious boundary information in Chinese reading. It has been shown that the introduction of word boundary information presented with alternating colors without changing the text distribution could significantly improve the reading speed of Chinese children in grade 2 (Perea and Wang in Mem Cognit 45(7):1160−1170, 2017. https://doi.org/10.3758/s13421-017-0717-0 ). However, few studies have examined how the effect of word boundary information on children's oral reading develops and changes as children’s grade increases. The present study asked Chinese children in grades 2–5 to read alternating-color and mono-color text orally and used eye-tracking technology to explore the developmental trajectory of the influence of word boundary information on oral reading. The results indicated that children in grade 2 and grade 3 showed faster reading speeds in the alternating-color condition than in the mono-color condition. In contrast, there was no difference between the two conditions in children in grade 4 and grade 5. We discuss the mechanisms of the findings and the implications for education.

  • Research Article
  • Cite Count Icon 29
  • 10.1007/s11145-020-10067-9
Chinese children benefit from alternating-color words in sentence reading
  • Jul 21, 2020
  • Reading and Writing
  • Jinger Pan + 3 more

Word boundary information is not marked explicitly in Chinese sentences and word ambiguity happens in Chinese texts. This introduces difficulty to parse characters into words when reading Chinese sentences, especially for beginning readers. In an eye-tracking study, we tested whether explicit word boundary information as provided by alternating text-colors affects reading performance of Chinese children and how such an effect is influenced by individual differences in word segmentation ability. Results showed that across a number of eye-movement measures, grade three children overall benefited from explicit marking of word boundary. Additionally, children with highest word segmentation ability showed the largest benefits in reading speed. We discuss possible implications for education.

  • Research Article
  • Cite Count Icon 41
  • 10.1017/s0142716420000211
Alternating-color words facilitate reading and eye movements among second-language learners of Chinese
  • May 1, 2020
  • Applied Psycholinguistics
  • Wei Zhou + 2 more

Abstract The present study investigated whether word-boundary information, provided by alternating colors (consistent or inconsistent with word-boundary information) in a Chinese sentence would facilitate the reading of second-language (L2) learners. Thirty-three Korean students were recruited in the eye-movement experiment. Relative to a baseline (i.e., mono-colors) condition, incorrect word segmentation produced closer fixation location toward the beginning of words, longer fixation duration, higher refixation rate, and slower reading speed. In contrast, word segmentation with alternating colors produced further fixation location toward the center of words, shorter fixation duration, lower refixation rate, and faster reading speed. These results indicate that L2 readers are capable of making use of word-boundary knowledge for saccade generation, which can result in a facilitation of reading efficiency.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.bandl.2019.104663
Alternating-color words influence Chinese sentence reading: Evidence from neural connectivity
  • Aug 9, 2019
  • Brain and Language
  • Wei Zhou + 4 more

Alternating-color words influence Chinese sentence reading: Evidence from neural connectivity

  • Research Article
  • Cite Count Icon 37
  • 10.3758/s13421-018-0797-5
Word segmentation by alternating colors facilitates eye guidance in Chinese reading.
  • Feb 12, 2018
  • Memory & Cognition
  • Wei Zhou + 4 more

During sentence reading, low spatial frequency information afforded by spaces between words is the primary factor for eye guidance in spaced writing systems, whereas saccade generation for unspaced writing systems is less clear and under debate. In the present study, we investigated whether word-boundary information, provided by alternating colors (consistent or inconsistent with word-boundary information) influences saccade-target selection in Chinese. In Experiment 1, as compared to a baseline (i.e., uniform color) condition, word segmentation with alternating color shifted fixation location towards the center of words. In contrast, incorrect word segmentation shifted fixation location towards the beginning of words. In Experiment 2, we used a gaze-contingent paradigm to restrict the color manipulation only to the upcoming parafoveal words and replicated the results, including fixation location effects, as observed in Experiment 1. These results indicate that Chinese readers are capable of making use of parafoveal word-boundary knowledge for saccade generation, even if such information is unfamiliar to them. The present study provides novel support for the hypothesis that word segmentation is involved in the decision about where to fixate next during Chinese reading.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 28
  • 10.1037/xhp0000425
Spelling ability selectively predicts the magnitude of disruption in unspaced text reading.
  • Sep 1, 2017
  • Journal of Experimental Psychology: Human Perception and Performance
  • Aaron Veldre + 2 more

We examined the effect of individual differences in written language proficiency on unspaced text reading in a large sample of skilled adult readers who were assessed on reading comprehension and spelling ability. Participants' eye movements were recorded as they read sentences containing a low or high frequency target word, presented with standard interword spacing, or in one of three unsegmented text conditions that either preserved or eliminated word boundary information. The average data replicated previous studies: unspaced text reading was associated with increased fixation durations, a higher number of fixations, more regressions, reduced saccade length, and an inflation of the word frequency effect. The individual differences results provided insight into the mechanisms contributing to these effects. Higher reading ability was associated with greater overall reading speed and fluency in all conditions. In contrast, spelling ability selectively modulated the effect of interword spacing with poorer spelling ability predicting greater difficulty across the majority of sentence- and word-level measures. These results suggest that high quality lexical representations allowed better spellers to extract lexical units from unfamiliar text forms, inoculating them against the disruptive effects of being deprived of spacing information. (PsycINFO Database Record

  • Research Article
  • 10.1007/s10579-016-9354-7
A comparative study of dictionaries and corpora as methods for language resource addition
  • May 21, 2016
  • Language Resources and Evaluation
  • Shinsuke Mori + 1 more

In this paper, we investigate the relative effect of two strategies for language resource addition for Japanese morphological analysis, a joint task of word segmentation and part-of-speech tagging. The first strategy is adding entries to the dictionary and the second is adding annotated sentences to the training corpus. The experimental results showed that addition of annotated sentences to the training corpus is better than the addition of entries to the dictionary. In particular, adding annotated sentences is especially efficient when we add new words with contexts of several real occurrences as partially annotated sentences, i.e. sentences in which only some words are annotated with word boundary information. According to this knowledge, we performed real annotation experiments on invention disclosure texts and observed word segmentation accuracy. Finally we investigated various language resource addition cases and introduced the notion of non-maleficence, asymmetricity, and additivity of language resources for a task. In the WS case, we found that language resource addition is non-maleficent (adding new resources causes no harm in other domains) and sometimes additive (adding new resources helps other domains). We conclude that it is reasonable for us, NLP tool providers, to distribute only one general-domain model trained from all the language resources we have.

  • 1
  • 2
  • 1
  • 2

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers