Abstract

The identification and classification of professional terms of machine translation are studied in this work, to improve the accuracy and professionalism of computer aided translation (CAT) software. Firstly, the current situation and related fields of machine translation are analyzed to summarize the difficulties and shortcomings in machine translation. Secondly, the concept of term is introduced to conduct targeted research on the imbalance problem of terminology classification and recognition in machine translation. Thirdly, a term recognition model based on integrated recognition method is proposed. Finally, the classification accuracy and recall rate of the model are verified using the method of confusion matrix in experiments. The results demonstrate that in comparison of the recall rate, classification accuracy, and f value in different fields, the classification accuracy of network terms by the hybrid method combining the over-sampling method and under-sampling method is the highest of 77%, that of sports terms is the lowest of 71%, and that of economic terms is 74%. Among the recall rate, accuracy rate and f value, the recall rate is the highest, reaching more than 80%, especially for economic terms of 91%. The combination of over-sampling and under-sampling performs better than the under-sampling with playback and under-sampling without playback in terms of term recognition and classification in different fields. Through the classification results before and after integration, it is obvious that the integration of each base classifier not only effectively improves the classification accuracy of terms, but also greatly improves the recall rate. This term recognition model can help CAT software in improving the recognition accuracy of term translation, which has certain practical effects and provides reference for research in related fields.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call