Abstract
The present study reports results from a series of computer experiments seeking to combine word-based Largest Chunk (LCh) segmentation and Agreement Groups (AG) sequence processing. The AG model is based on groups of similar utterances that enable combinatorial mapping of novel utterances. LCh segmentation is concerned with cognitive text segmentation, i.e. with detecting word boundaries in a sequence of linguistic symbols. Our observations are based on the text of Le petit prince (The little prince) by Antoine de Saint-Exupéry in three languages: French, English, and Hungarian. The data suggest that word-based LCh segmentation is not very efficient with respect to utterance boundaries, however, it can provide useful word combinations for AG processing. Typological differences between the languages are also reflected in the results.
Highlights
The Agreement Groups (AG) language processing model as proposed in Drienkó (2014) is a usage-based distributional framework where groups of utterances are formed according to the distribution of words in a given corpus
The present study reports results from a series of experiments seeking to combine wordbased Largest Chunk (LCh) segmentation with the AG utterance processing apparatus
The Inference Precision (IP) values show that the number of correctly inferred boundaries as compared to the number of all inferred boundaries is rather low, 16%, on average
Summary
The AG language processing model as proposed in Drienkó (2014) is a usage-based distributional framework where groups of utterances are formed according to the distribution of words in a given corpus. The contexts of words or phrases are helpful in categorisation research based on cluster analysis In Mintz (2003) a context, or frame, is provided by words that immediately precede or follow a given target element and a frequent frame is a context occurring with a frequency above an arbitrary threshold value. St. Clair et al (2010) suggest that flexible frames, with bigram information from contexts, are more suited for categorising than only frequent frames. Item-based phrases in language acquisition research, as framed by words in initial positions, constitute a specific type of context
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.