Abstract

Classical grammar for natural languages, which is defined by the linguistics, is widely used in many natural languages processing (NLP) tasks, such as information extraction, machine translation and parsing. The classical grammar is well defined but is context free and does not include the complex patterns which contain multiple linguistic units. On the other hand, there are also many simple patterns which are not included in the classical grammar but are useful in the NLP tasks. Therefore, the recognition of special linguistic patterns from natural language is an important step in various NLP systems. We propose an unsupervised method to automatically discover the complex monolingual linguistic patterns from a classically parsed and aligned bilingual corpus. And all the patterns in one language are qualified by the other parallel language. A specialized and efficient algorithm is applied to mine the frequent bilingual subtrees in the forest and the found subtrees are formalized as the linguistic patterns.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.