LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation

Thodsaporn Chay-Intr,Manabu Okumura,Kotaro Funakoshi,Hidetaka Kamigaito

doi:10.5715/jnlp.30.456

Abstract

A character sequence comprises at least one or more segmentation alternatives. This can be considered segmentation ambiguity and may weaken segmentation performance in word segmentation. Proper handling of such ambiguity lessens ambiguous decisions on word boundaries. Previous works have achieved remarkable segmentation performance and alleviated the ambiguity problem by incorporating the lattice, owing to its ability to capture segmentation alternatives, along with graph-based and pre-trained models. However, multiple granularity information, including character and word, in a lattice that encodes with such models may not be attentively exploited. To strengthen multi-granularity representations in a lattice, we propose the Lattice ATTentive Encoding (LATTE) method for character-based word segmentation. Our model employs the lattice structure to handle segmentation alternatives and utilizes graph neural networks along with an attention mechanism to attentively extract multi-granularity representation from the lattice for complementing character representations. Our experimental results demonstrated improvements in segmentation performance on the BCCWJ, CTB6, and BEST2010 datasets in three languages, particularly Japanese, Chinese, and Thai.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation

Abstract

Talk to us

Similar Papers

More From: Journal of Natural Language Processing

Lead the way for us

Journal: Journal of Natural Language Processing	Publication Date: Jan 1, 2023
License type: cc-by

Similar Papers

Chinese Word Segmentation Based on Maximum Entropy
Xiaolin Li ... Zerong Hu
-
Xiaolin Li, et. al.Xiaolin Li ... Zerong Hu
16 Oct 2019
16 Oct 2019

A Levenshtein distance-based method for word segmentation in corpus augmentation of geoscience texts
Jinqu Zhang ... Weirong Li
Annals of GIS | VOL. 29
Jinqu Zhang, et. al.Jinqu Zhang ... Weirong Li
12 Jan 2023
Annals of GIS | VOL. 29

A robust method for line and word segmentation in handwritten text
Abdelaali Hassaine
-
Abdelaali HassaineAbdelaali Hassaine
01 Jan 2013
01 Jan 2013

Investigating word segmentation of Chinese second language learners
Shuyi Yang
Reading and Writing | VOL. 34
Shuyi YangShuyi Yang
03 Jan 2021
Reading and Writing | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation

Abstract

Talk to us

Similar Papers

More From: Journal of Natural Language Processing