Multilingual Support Research Articles

Character sets are one of the basic issues for information interchange. Most current national standard character sets extend 7-bit ASCII. These extensions conflict with each other and make the design of multilingual information systems complicated. Unicode or the Universal Character Set (UCS) is a character set that covers symbols in the major written languages. Text files and strings usually have no header to indicate which character set is in use, and they currently use one of the national standards by default. The transition from national standards to Unicode may take a longer time than expected. This paper presents the following methods to help the transition. (1) A text file format of fixed-width characters: if the first character in a text file is a nonzero control code, the file is in UCS; otherwise, it is in the default national standard. The control code indicates which UCS subset or byte order is in use. (2) A tagged string storage: each string has a tag representing which character set or coding format is in use, e.g., the default national standard, 8-bit subset of UCS-2, UCS-2, or UCS-4. (3) A method for assigning the format of string literals: all string literals use the same syntax notation, and their storage format is the same as that of their source files. These methods can improve multilingual support without introducing much complexity. Copyright © 2000 John Wiley & Sons, Ltd.

Fast access to information in different languages is still a major problem for many organizations. We have built a multilingual analyst‘s workstation integrated in the Tipster document management toolkit. The analyst workstation offers to an English-speaking analyst a variety of tools to browse sets of documents in Arabic, Japanese, Spanish and Russian, including a Unicode-based multilingual editor, and a simple machine translation functionality. The Temple project has developed an open multilingual architecture and software support for rapid development of extensible machine translation functionalities. The targeted languages are those for which natural language processing and human resources are scarce or difficult to obtain. The goal is to support rapid development of machine translation functionalities in a very short time with limited resources. Glossary-based machine-translation (GBMT) is used to provide an English gloss of a foreign document. A GBMT system uses a bilingual phrasal dictionary (glossary) to produce a phrase-by-phrase translation. Translation (based on phrase pattern-matching) is fast and accurate regarding the content of the document and browsed documents can be translated almost in real-time. A GBMT system for a language pair is also extremely simple, cheap and fast to develop. Moreover, all language resources used by the system are entirely under the control of the user.

Multilingual Support Research Articles

Related Topics

Articles published on Multilingual Support

Transition from national standards to Unicode: multilingual support in operating systems and programming languages

Transition from national standards to Unicode: multilingual support in operating systems and programming languages

Active Learning Centre: design and evaluation of an educational World Wide Web site.

Toward a Multilingual, Experiential Environment for Learning Decision Technology

Glossary-Based MT Engines in a Multilingual Analyst‘s Workstation Architecture

Enhancing Business Communication with Group Decision Support Systems

Design of a bitmapped multilingual workstation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multilingual Support Research Articles

Related Topics

Articles published on Multilingual Support

Transition from national standards to Unicode: multilingual support in operating systems and programming languages

Transition from national standards to Unicode: multilingual support in operating systems and programming languages

Active Learning Centre: design and evaluation of an educational World Wide Web site.

Toward a Multilingual, Experiential Environment for Learning Decision Technology

Glossary-Based MT Engines in a Multilingual Analyst‘s Workstation Architecture

Enhancing Business Communication with Group Decision Support Systems

Design of a bitmapped multilingual workstation