Abstract
It is necessary to analyze and mining marketing notification texts because there are various commercial information. The base of the operation is Chinese word segmentation. The speed and accuracy of word segmentation have important influence on the subsequent texts mining. We compared accuracy, recall, and F-value of four open-source Chinese word segmentation tools (Ansj, HanLP, Word and Jieba) on the third-party datasets. Then, we compared the segmentation speed of the four tools on one million marketing notification texts. Finally, we segmented 5, 000 marketing notification texts artificially. We evaluated the performance of these segmentation tools by the results of artificial segmentation, which are known as evaluate standard. The experiments show the Base mode of the Ansj is the fastest. The HanLP is a best segmentation tool for balancing speed and accuracy of word segmentation. After adding a custom dictionary, the effect of word segmentation has been significantly improved.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.