Abstract

Abstract To address multilingual document classification in an effcient and effective manner, we claim that a synergy between classical IR techniques such as vector model and some advanced data mining methods, especially Formal Concept Analysis, is particularly appropriate. We propose in this paper, a new statistical approach for extracting inter-language clusters from multilingual documents based on Closed Concepts Mining and vector model. Formal Concept Analysis techniques are applied to extract Closed Concepts from comparable corpora; and, then, exploit these Closed Concepts and vector models in the clustering and alignment of multilin- gual documents. An experimental evaluation is conducted on the collection of bilingual documents French-English of CLEF’2003. The results confirmed that the synergy between Formal Concept Analysis and vector model is fruitful to extract bilingual classes of documents, with an interesting comparability score.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.