Multilingual Text Mining

Federico Neri

doi:10.2495/data050091

Abstract

The availability of a huge amount of textual data from a bewildering variety of sources leads to the well-identified paradox based on which an overload of information means no usable knowledge. In fact, up to 80% of electronic data is textual. Moreover, the most valuable information is encoded in pages which are written in various native languages, but are relevant even to non-native speakers. The process of accessing all these raw data, heterogeneous for language used, and transforming them into information is therefore inextricably linked to the concepts of textual analysis and synthesis, hinging greatly on the ability to master the problems of multilingualism. Through multilingual text mining, users can get an overview of great volumes of textual data having a highly readable grid, which helps them discover meaningful similarities among documents and find all related information. This paper describes the approach used by SYNTHEMA for multilingual text mining, showing the classification results on around 600 breaking news items written in English, Italian and French. 1 Multilingual resources construction Generally speaking, the manual construction and maintenance of multilingual language resources is undoubtedly expensive, requiring remarkable efforts. Being established in 1994 by computer scientists from the IBM Research Center, with the expertise and skills suited to provide effective software solutions, as well as carry out R&D in Natural Language Processing area, SYNTHEMA has been involved in Machine Translation, Information Extraction and Text Mining activities since 1996, primarily in the field of Technology Watch. The growing availability of comparable and parallel corpora has pushed SYNTHEMA to develop specific methods for semi-automatic updating of lexical resources. They are based on Natural Language Understanding and Machine Learning. These techniques detect multilingual lexicons from such corpora, by extracting all the © 2005 WIT Press WIT Transactions on Information and Communication Technologies, Vol 35, www.witpress.com, ISSN 1743-3517 (on-line) Data Mining VI 89

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multilingual Text Mining

Abstract

Talk to us

Similar Papers

More From: WIT Transactions on Information and Communication Technologies

Lead the way for us

Journal: WIT Transactions on Information and Communication Technologies	Publication Date: May 4, 2005
Citations: 1

Similar Papers

Developments in The Field of Natural Language Processing

International Journal of Advanced Research in Computer Science | VOL. 8

30 Apr 2017
International Journal of Advanced Research in Computer Science | VOL. 8

Multilingual Access to Educational Material Through Contributive Post-editing of MT Pre-translations by Foreign Students
Ruslan Kalitvianski ... Valérie Bellynck
-
Ruslan Kalitvianski, et. al.Ruslan Kalitvianski ... Valérie Bellynck
01 Jan 2015
01 Jan 2015

Focus on Authors
-
Marketing Science | VOL. 31
--
01 May 2012
Marketing Science | VOL. 31

Criminal Activity Detection in Social Network by Text Mining: Comprehensive Analysis
Tamanna Siddiqui ... Najeeb Ahmad Khan
-
Tamanna Siddiqui, et. al.Tamanna Siddiqui ... Najeeb Ahmad Khan
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multilingual Text Mining

Abstract

Talk to us

Similar Papers

More From: WIT Transactions on Information and Communication Technologies