Cross-language Information Retrieval Research Articles

The exponential growth of data sizes created by digital media (video/audio/images), physicalsimulations, scientific instruments and web authoring joins the new growth of interest in cloud computing. The options for distribution and parallelization of information in clouds make the retrieval and storage processes very complicated, especially when faced with real-time data management. The quantity of Web Users getting access to data over Internet is expanding step by step. An enormous measure of data on Internet is accessible in various languages which could be accessed by anyone whenever. The Information Retrieval (IR) manages finding valuable data from a huge assortment of unorganized, organized and semi-organized information. In the present situation, the variety of data and language boundaries are the difficult challenges for communication and social trade over the world. To tackle such obstructions, CLIR, the cross-language information retrieval frameworks, are these days in solid interest. The Query Expansion (Q.E.) is the way toward adding related and important terms to original inquiry to upgrade its indexing ability to improve the significance of recovered files in CLIR. In this exploration work, Q.E. has been investigated for a Hindi-English and Kannada-English CLIR in that Hindi and Kannada queries are utilized to look through English docs. After the interpretation of query, recovered outcomes are positioned making use of OkapiBM25 to organize the most important doc at the top for expanding the significance of recovered docs using QE. We proposed architecture for Hindi-English and Kannada-English CLIR making use of QE. to improve the importance of recovered reports. In the primary investigation, QE. is performed with and without OkapiBM25 ranking. The outcomes show that the pertinence of recovered archives is higher with OKapiBM25 as contrast with the one without positioning. The work docs plainly demonstrate that the presentation of Hindi-English and Kannada-English CLIR framework can be improved altogether with query development using fitting terms located at suitable place and the recovered Snippets can incredibly fill in as the continuous test collection.

Read full abstract

Translation language resources, such as bilingual word lists and parallel corpora, are important factors affecting the effectiveness of cross-language information retrieval (CLIR) systems. In particular, when large domain-appropriate parallel corpora are not available, developing an effective CLIR system is particularly difficult. Furthermore, creating a large parallel corpus is costly and requires considerable effort. Therefore, we here demonstrate the construction of parallel corpora from Wikipedia as well as improved query translation, wherein the queries are used for a CLIR system. To do so, we first constructed a bilingual dictionary, termed WikiDic. Then, we evaluated individual language resources and combinations of them in terms of their ability to extract parallel sentences; the combinations of our proposed WikiDic with the translation probability from the Web’s bilingual example sentence pairs and WikiDic was found to be best suited to parallel sentence extraction. Finally, to evaluate the parallel corpus generated from this best combination of language resources, we compared its performance in query translation for CLIR to that of a manually created English–Korean parallel corpus. As a result, the corpus generated by our proposed method achieved a better performance than did the manually created corpus, thus demonstrating the effectiveness of the proposed method for automatic parallel corpus extraction. Not only can the method demonstrated herein be used to inform the construction of other parallel corpora from language resources that are readily available, but also, the parallel sentence extraction method will naturally improve as Wikipedia continues to be used and its content develops.

Read full abstract

Cross-language Information Retrieval Research Articles

Related Topics

Articles published on Cross-language Information Retrieval

Spoken language identification based on the transcript analysis

A Comparative Optimization Model of Japanese Literature Characteristics for Cognitive Retrieval of Cross-Language Information.

An Ontology based Smart Management of Linguistic Knowledge

Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement

Neural topic-enhanced cross-lingual word embeddings for CLIR

Mining an English-Chinese parallel Dataset of Financial News

Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN

Transliteration: A Magnetic Analysis

A Review on Indexing Techniques and its application in Multilingual Information Retrieval System

English-Vietnamese Cross-Lingual Paraphrase Identification Using MT-DNN

Characteristics recognition and soft multimedia system for Japanese machine translation and edge-driven hardware implementations

Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

Exploring Topic-language Preferences in Multilingual Swahili Information Retrieval in Tanzania

English Audio Language Retrieval Based on Adaptive Speech-Adjusting Algorithm

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval

Effective preprocessing based neural machine translation for English to Telugu cross-language information retrieval

Kelantan and Sarawak Malay Dialects: Parallel Dialect Text Collection and Alignment Using Hybrid Distance-Statistical-Based Phrase Alignment Algorithm

CLOUD BASED MULTI-LANGUAGE INDEXING USING CROSS LINGUAL INFORMATION RETRIEVAL APPROACHES

Parallel sentence extraction to improve cross-language information retrieval from Wikipedia

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cross-language Information Retrieval Research Articles

Related Topics

Articles published on Cross-language Information Retrieval

Spoken language identification based on the transcript analysis

A Comparative Optimization Model of Japanese Literature Characteristics for Cognitive Retrieval of Cross-Language Information.

An Ontology based Smart Management of Linguistic Knowledge

Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement

Neural topic-enhanced cross-lingual word embeddings for CLIR

Mining an English-Chinese parallel Dataset of Financial News

Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN

Transliteration: A Magnetic Analysis

A Review on Indexing Techniques and its application in Multilingual Information Retrieval System

English-Vietnamese Cross-Lingual Paraphrase Identification Using MT-DNN

Characteristics recognition and soft multimedia system for Japanese machine translation and edge-driven hardware implementations

Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

Exploring Topic-language Preferences in Multilingual Swahili Information Retrieval in Tanzania

English Audio Language Retrieval Based on Adaptive Speech-Adjusting Algorithm

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval

Effective preprocessing based neural machine translation for English to Telugu cross-language information retrieval

Kelantan and Sarawak Malay Dialects: Parallel Dialect Text Collection and Alignment Using Hybrid Distance-Statistical-Based Phrase Alignment Algorithm

CLOUD BASED MULTI-LANGUAGE INDEXING USING CROSS LINGUAL INFORMATION RETRIEVAL APPROACHES

Parallel sentence extraction to improve cross-language information retrieval from Wikipedia