Syntactic and Semantic Ambiguity Processing in Neural MT from English to French, Spanish and Italian
 Despite recent advances in artificial intelligence, human translators outperform MT for at least three types of tasks: identifying referents in anaphora (especially of the interphrastic kind), resolving semantic ambiguity (which is mainly due to polysemy or homonymy), and resolving syntactic ambiguity (especially with poorly inflected source languages such as English). Using the results obtained by two freely available online machine translation programs, Google Translate and DeepL, we examine how these two types of ambiguity are processed in translation from English into French, Spanish and Italian. Our results show that the two programs perform well overall in resolving the simplest cases of syntactic ambiguity, with difficulties arising more frequently for noun phrases featuring atypical syntactic divisions and rarely used collocations. MT output for ambiguous structures involving verb roots followed by the –ING morpheme (flying planes, growing pains) is studied, as well as syntactic structures in which two or more nouns are preceded by one or more adjectives. MT handles relatively well the longest of those structures (ADJ ADJ N N N N), probably because their subsets are part of the bilingual or target language monolingual corpora that underlie MT systems. Structures involving head modification and coordination (ADJ N AND N) are also known to pose problems for MT and human translators alike. But since many of the most frequent N AND N structures involve cohyponyms (men and women, brothers and sisters), antonyms (rights and duties, costs and benefits) or near- synonyms (aid and advice), their translation as a whole unit generally triggers the choice of correct syntact dependencies in translation. Structures in which the adjective only modifies the first noun (fresh air and exercise, social sciences and humanities) are much less frequent and are also probably translated as a whole unit. Structures involving premodification, coordination and post-modification may give rise to four distinct types of structures depending on whether long-range dependencies apply (detailed [knowledge and understanding] of the IT industry, [ethnic group] and [place of birth], invaluable [context and [source of information], [close friend] and confidant] of Mr Jones. Structures in which both long-range dependencies apply (integrated prevention and control of pollution) are the ones which most frequently cause errors for MT. Semantic ambiguity cases have been processed with increasing success by MT, especially when collocates vary widely for the main two meanings of homonyms (a well-known example is the word pen). Processing polysemy (for instance the medical use of conditions in pre-existing conditions) is a bit more of a challenge for MT. Other cases involving concentration of several polysemic terms in the same sentence (Changing the placement of beams relative to the staff involves changing the direction of the stems in the beam) also create difficulties for MT when the polysemic terms are used without any of their usual collocates (here in the specialised field of musical edition). Homonymy cases involving grammatical category changes (N-to-V or V-to-N conversion) seem to continue to pose the most difficulties to neural MT, despite increasing consideration of intra- and extraphrastic context. Potentially ambiguous word sequences (treatment increase in as the daily dose and duration of treatment increase), which were processed incorrectly before neuronal MT, are now correctly translated. But word sequences in which one word belongs to a part of speech which is not the most commonly used one (e.g. the noun remains in what remains can be considered) may cause occasional errors. Several examples that involve the verb founder are studied, and they frequently trigger translation of the noun in all three Romance languages (or translations of the verbs find or found due to incorrect segmentation).
Read full abstract