Abstract

SummaryPatent is a very important and valuable type of scientific and technical big data. This paper presents how to mine patent text to obtain valuable information/knowledge from large‐scale candidates obtained from these patents based on massive patent texts. We firstly propose a patent term extraction method using co‐occurrence in the abstract and first‐claim sections of patent records. There are three steps: (1) we extract candidate strings according to our definition of a term; (2) we propose an assumption to verify whether a candidate string is a qualified term or not by using the co‐occurrence of terms in the abstract and first claim; and (3) we use term frequency–inverse document frequencyAUTHOR: TF‐IDF has been defined as “term frequency–inverse document frequency”. Please check if correct. or mutual information to rank and select candidate terms. Secondly, we propose a new method to obtain valuable long tail term from patents. To fulfill the purpose, (1) we firstly build long tail term–common term pair as candidate set; (2) then we evaluate each candidate pair's value; and finally, (3) to demonstrate our method, we give an example on our result. This study provides a new perspective in extracting terms from free texts of patent records and also proposes a new method to obtain valuable long term to aid information analysis with massive patent texts. Copyright © 2016 John Wiley & Sons, Ltd.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call