A Semi-supervised Approach of Extracting Attribute-Value Pairs of Chinese eBook using Conditional Random Fields

Yongquan Dong ,Qiang Chu ,Ping Ling

doi:10.4156/jcit.vol8.issue1.28

Abstract

We describe a method to extract attribute–value pairs from Chinese eBook descriptions in order to augment book databases by representing each eBook as a set of attribute-value pairs. Such a representation is beneficial for tasks such as demand forecasting, product recommendations. Current attribute-value extraction approaches include: rule based or machine learning based. Since there is no consolidated structure for most Chinese eBook descriptions, the approach relying on rules does not seem to perform satisfactorily. In this paper, we formulate the extraction task as a sequential labeling problem and use a semi-supervised algorithm with Conditional Random Fields (CRF) to solve it. The extraction system requires a limited amount of labeled training examples to reduce the human work in preparing training examples. In the extraction system, we use rich features including literal, context and semantics. Finally, the extracted attributes and values are linked to form pairs using constraint conditions. Experimental results show that our proposed method has good performance to extract Chinese eBook attribute-value pairs.

Full Text