Abstract

With the rapid development of Japanese information processing technology, problems such as polysemy and ambiguity at the text and dialogue level, as well as unregistered words, have become increasingly prominent because computers cannot fully “understand” the semantics of words. How to make the computer “understand” the semantics of words accurately requires the computer to “understand” the rules of converting and integrating words into words from the perspective of semantics. Traditional Japanese text classification mostly adopts the text representation method of vector space model, which has the problem of confusing classification effect. Therefore, this paper proposes the topic of constructing a semantic word formation pattern prediction model based on a large-scale annotated corpus. This paper proposes a solution that combines Japanese semantic word formation rules with pattern recognition algorithms. Aiming at this scheme, a variety of pattern recognition algorithms were compared and analyzed, and the naive Bayesian model was decided to predict semantic word formation patterns. This paper further improves the accuracy of computer prediction of Japanese semantic word formation patterns by adding part of speech. Before modeling, the parts of speech of words are automatically tagged and manually checked based on the original annotated corpus. In the research on predicting Japanese semantic word formation patterns, this paper builds a semantic word formation pattern prediction model based on Naive Bayes and conducts simulation experiments. We divide the eight types of semantic word formation patterns in the annotated corpus into two groups, and divide the obtained sample sets into training sets and test sets, so that the Naive Bayes model first learns semantic word formation rules based on the training sets of each group. Semantic word formation patterns are predicted on the test set for each group. The simulation results show that the prediction model of semantic word formation mode has a generally high degree of fit and prediction accuracy. The prediction model of semantic word formation pattern based on this theory can ensure that the computer can judge the semantic word formation pattern more accurately.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call