Multilingual Web sites are expected to provide the same content expressed in various languages, presented according to a common style, with the same interaction facilities. To this extent, most Web developers start from a source language version of the site and produce the multilingual versions by providing translations in all supported languages. Translation pages are usually generated by replicating the HTML structure and the scripting language sections of the original pages and by translating the textual sections into the target languages. This practice exposes the site to several problems during its evolution. Updates may be not properly propagated to all translations, and unwanted divergences can be introduced over time in content, presentation and interaction. In this paper, we propose a prototype toolkit, limited to Western languages, that can help restructuring an existing static Web site, and migrating its multilingual content to a unified and consistent representation. First of all, pages are classified according to the language of their content. Then, correspondences among pages in the original language and their translations are determined. Based on the computation of the edit operations necessary to make each page consistent with its translations, the site is updated to a new version where all pages are aligned. In the last phase, a unified representation of the structure and of the multilingual content of each page is inserted into a Content Management System. This ensures a consistent future evolution of the site. The prototype toolkit has been tested on 10 existing static Web sites, with texts in Italian, English, German and Spanish. For some of the above-mentioned phases, alternative solutions have been considered and their relative advantages have been evaluated against a manually constructed gold standard. We are quite confident that with some adaptation, most of the results we obtained can be extended to any pair of Western languages. Copyright © 2005 John Wiley & Sons, Ltd.
Read full abstract