Cross-language Information Retrieval Method Research Articles

Web service became one of the important methods for communication through the internet and its usage increased in the levels of users and developers. Semantic web service represents the second generation of web services that contains more description and information about its contents. Searching and dealing with web service is done through process called web service discovery which returns a Semantic Web Service Description Language file (SWSDL) for each web service. This research aims to expand the semantic web service usage through adding the multilanguage capability to the web service’s discovery process and through recommending other web services to the user based on his history in using web services. These aims were achieved by modifying the web service discovery model through adding two important techniques the Cross Language Information Retrieval (CLIR) technique and the data mining association rules technique. This research proposed two sub models, the first sub model proposed the application of CLIR techniques and information retrieval method to support Bilingual Web service discovery process the second language that proposed here is Arabic. Text mining techniques were applied on SWSDL content and user’s query to be ready for CLIR methods, this sub model was tested on a curated catalogue of Life Science Web Services http://www.biocatalogue.org/ and achieving 99.38 % accuracy and 87.23 precision of the effectiveness of the monolingual system. The second sub model proposed a process of web service recommendation by applying the data mining techniques to suggest another web service beside the one he got from the discovery process based on the user’s history. This sub model was tested on the mention curated web services site and the results were 65 % of users chose services from the services that recommended by the proposed sub model.

Read full abstract

Cross-lingual information retrieval is a difficult task typically involving query translation into multiple languages followed by monolingual retrieval in each language. Latent Semantic Analysis allows cross-lingual retrieval without translating queries by working from an already existing corpus of translations. Thus, collecting such a corpus obviates the need to construct complicated translation tools, making this technique particularly applicable to querying less commercially appealing languages. First, we extend work on retrieval from an English-French corpora split into training and test sets to examine the effects of training on a corpus from a completely different. Success is measured by the proportion of direct translations correctly considered most similar by Latent Semantic Analysis. Secondly, an English only similarity task from the literature is also extended to train on a different corpus to the one being tested on. Here the degradation in performance is measured through examining the variation in the correlations between the inter-document similarity judgements calculated by Latent Semantic Analysis and an experimentally derived baseline of human judgements of inter-document similarity. Higher order indexing schemes discarding uncommon terms, sparse matrix representations and the removal of factors with very low eigenvalues are used to enhance efficiency. Performance degradation from exogenous training is shown in both cases. The best results occur using stopping, log-entropy weighting and over 500 factors. References K. Boerner. Extracting and visualizing semantic structures in retrieval results for browsing. In Peter J. Nuernberg, David L. Hicks and Richard Furuta, editors, Proceedings of the fifth ACM conference on Digital libraries, pages 234--235. ACM 2000. doi:http://doi.acm.org/10.1145/336597.336672 Deerwester, S. C., Dumais, S. T., Landauer, T. K., Fernas, G. W. and Harshman, R. A., Indexing by Latent Semantic Analysis, Journal of the American Society of Information Science, 41, 1990, 391--407. doi:10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 S. T. Dumais, T. K. Landauer and M. L. Littman. Automatic cross-linguistic information retrieval using Latent Semantic Indexing. In SIGIR'96 - Workshop on Cross-Linguistic Information Retrieval, pages 16--23. ACM, 1996. T. K. Landauer and M. L. Littman. Fully automatic cross-language document retrieval using latent semantic indexing. In Gregory Grefenstette, editor, Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, pages 31--38. UW Centre for the New OED and Text Research, Waterloo Ontario, 1990. Landauer, T. K., Littman, M. L. and Stornetta, W. S., A statistical method for cross-language information retrieval. Unpublished manuscript, 1992. Landauer, T. K., Foltz, P. W. and Laham, D., Introduction to Latent Semantic Analysis, Discourse Processes, textbf{25}, 1998, 259--284. Lloyd, R. and Shakiban, C., Improvements in Latent Semantic Analysis, American Journal of Undergraduate Research, 3, 2004, 29--34. http://www.ajur.uni.edu/v3n2 B. Pincombe. Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus. Research Report DSTO-RR-0278. DSTO, 2004. http://dspace.dsto.defence.gov.au/dspace/bitstream/1947/3334/1/DSTO-RR-0278%0PR.pdf P. G. Young. Cross-language information retrieval using latent semantic indexing. Technical Report UT-CS-94-259. University of Tennessee, 1994. M. D. Lee, B. M. Pincombe and M. B. Welsh. An empirical evaluation of models of text document similarity. In Bruno G. Bara, Lawrence Barsalou and Monica Bucciarelli, editors, Proceedings of the 27th Annual Conference of the Cognitive Science Society, pages 1254--1259. Lawrence Erlbaum Associates, Mahwah, NJ, 2005. http://hdl.handle.net/2440/28910

Read full abstract

Cross-language Information Retrieval Method Research Articles

Related Topics

Articles published on Cross-language Information Retrieval Method

An axiomatic approach to corpus-based cross-language information retrieval

Cross-Language Semantic Web Service Discovery to Improve the Selection Mechanism by using Data Mining Techniques

QUERY TRANSLATION USING CONCEPTS SIMILARITY BASED ON QURAN ONTOLOGY FOR CROSS-LANGUAGE INFORMATION RETRIEVAL

Cross-lingual latent semantic analysis

Improving query translation in English–Korean cross-language information retrieval

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cross-language Information Retrieval Method Research Articles

Related Topics

Articles published on Cross-language Information Retrieval Method

An axiomatic approach to corpus-based cross-language information retrieval

Cross-Language Semantic Web Service Discovery to Improve the Selection Mechanism by using Data Mining Techniques

QUERY TRANSLATION USING CONCEPTS SIMILARITY BASED ON QURAN ONTOLOGY FOR CROSS-LANGUAGE INFORMATION RETRIEVAL

Cross-lingual latent semantic analysis

Improving query translation in English–Korean cross-language information retrieval