Abstract

ABSTRACT Keyword search is the most ordinary tool in patent offices; however, for more advanced research, free software is not presented on their websites. Thus, this paper has the purpose to provide a data-mining framework for patent documents, linking the natural language processing techniques and data analysis algorithms. The system has two main goals: the analysis of technological prospection and the evaluation of similarities among patents through titles and abstracts. For numerical experiments, we used the base of the US Patent and Trademark Office, with over a million documents. Analysing patents about TFT-LCD, Flash Memory and PDA, from 2010 to 2018, with S-Curve it was observed that the last two technologies decline. Using a cloud of words, it was possible to see the phone’s evolution, from 2010 to 2015. To evaluate the degree of similarity among patents, we investigated Latent Semantic Analysis (LSA), Word2vec, Word Mover’s Distance (WMD), in three different study cases. In addition, these methods were compared with the classical Jaccard index. Numerical results show that LSA and WMD obtained similar patent indications, and the Jaccard index presented different indications from the other three.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call