Abstract
As multilingual text increases, the analysis of multilingual data plays a crucial role in statistical translation models, cross-language information retrieval, the construction of parallel corpus, bilingual information extraction and other fields. In this paper, we introduce convolutional neural network and propose auto-associative memory for the fusion of multilingual data to classify multilingual short text. First, the open-source tool word2vec is used to extract word vector for textual representation. Then, the auto-associative memory relationship can extract the multilingual document semantic, which need to calculate the statistical relevance of word vector between different languages. A critical problem is the domain adaptation of classifiers in different languages and we solve it by transforming multilingual text features. In order to fuse a dense combination of high-level features in multilingual text semantics, we introduce convolutional neural network into the model, and output classification prediction results. This model can process multilingual textual data well. Experiments show that convolutional neural network combined with auto-associative memory improves classification accuracy by 2 to 6% in multilingual text classification, compared to other classic models. Furthermore, the proposed model reduces the dependence of multilingual text on the parallel corpus, thus have good expansibility for multilingual data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.