Abstract
Cognates are words in different languages that have similar spelling and meaning. They can help second-language learners with vocabulary expansion and reading comprehension tasks. Special attention needs to be paid to pairs of words that appear similar but are in fact false friends: they have different meanings in all contexts. Partial cognates are pairs of words in two languages that have the same meaning in some, but not all, contexts. Detecting the actual meaning of a partial cognate in context can be useful for Machine Translation and Computer-Assisted Language Learning tools. Our research on cognate and false-friend words between two pair of languages (French and English in our case) consists in automatically classifying a pair of words from two languages as cognates or false friends. We use Machine Learning techniques with several measures of orthographic similarity as features for classification. We study the impact of selecting different features, averaging them, and combining them through Machine Learning techniques. The methods work on different pair of languages as long as a small amount of annotated pairs of words is provided as training data. In addition to the work done on cognate and false-friend identification we propose a supervised and a semi-supervised method that uses bootstrapping for disambiguating partial cognates between French and English. The proposed methods use only automatically-labeled data and therefore they can be applied to other pairs of languages as well. The data that we use is automatically collected from parallel corpora. The impact of data collected from different domains is also taken into account in our research. To complement the studies that we did on cognates, false friends and partial cognate pairs of words, we developed an annotation tool for this special type of words. The tool can automatically annotate cognates, false friends and partial cognates for any French text. The tool uses UIMA (Unstructured Information Management Architecture) from IBM and BaLIE (an open-source Java project designed to extract information from free text).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.