Automatic text chunking aims to recognize grammatical phrase structures in natural language text. Text chunking provides downstream syntactic information for further analysis, which is also an important technology in the area of text mining (TM) and natural language processing (NLP). Existing chunking systems make use of external knowledge, e.g. grammar parsers, or integrate multiple learners to achieve higher performance. However, the external knowledge is almost unavailable in many domains and languages. Besides, employing multiple learners does not only complicate the system architecture, but also increase training and testing time costs. In this paper, we present a novel phrase chunking model based on the proposed mask method without employing external knowledge and multiple learners. The mask method could automatically derive more training examples from the original training data, which significantly improves system performance. We had evaluated our method in different chunking tasks and languages in comparison to previous studies. The experimental results show that our method achieves state of the art performance in chunking tasks. In two English chunking tasks, i.e., shallow parsing and base-chunking, our method achieves 94.22 and 93.23 in F(@b=1) rates. When porting to Chinese, the F(@b=1) rate is 92.30. Also, our chunker is quite efficient. The complete chunking time of a 50K-words is less than 10s.
Read full abstract