Abstract
A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.