Abstract

Researchers must read and understand a large volume of technical papers, including patent documents, to fully grasp the state-of-the-art technological progress in a given domain. Chemical research is particularly challenging with the fast growth of newly registered utility patents (also known as intellectual property or IP) that provide detailed descriptions of the processes used to create a new chemical or a new process to manufacture a known chemical. The researcher must be able to understand the latest patents and literature in order to develop new chemicals and processes that do not infringe on existing claims and processes. This research uses text mining, integrated machine learning, and knowledge visualization techniques to effectively and accurately support the extraction and graphical presentation of chemical processes disclosed in patent documents. The computer framework trains a machine learning model called ALBERT for automatic paragraph text classification. ALBERT separates chemical and non-chemical descriptive paragraphs from a patent for effective chemical term extraction. The ChemDataExtractor is used to classify chemical terms, such as inputs, units, and reactions from the chemical paragraphs. A computer-supported graph-based knowledge representation interface is developed to plot the extracted chemical terms and their chemical process links as a network of nodes with connecting arcs. The computer-supported chemical knowledge visualization approach helps researchers to quickly understand the innovative and unique chemical or processes of any chemical patent of interest.

Highlights

  • Researchers studying new materials or novel chemical processes rely on prior art in published technical documents and patented processes registered with patent agencies, such as the United States Patent and Trademark Office (USPTO) or the European Patent

  • Different classes of text can be found directly from hypertext markup language (HTML) to obtain a specific type of paragraphs

  • The graph visualization solutions demonstrate the initial research for offering a complete visual solution to tion solutions demonstrate the initial research for offering a complete visual solution to visualizing chemical process patents or other types of process patents

Read more

Summary

Introduction

Researchers studying new materials or novel chemical processes rely on prior art in published technical documents and patented processes registered with patent agencies, such as the United States Patent and Trademark Office (USPTO) or the European Patent. The traditional method treats numbers as a stop word and removes numbers that do not have a relationship to the domain, such as the numbering of text sections This approach is not applicable to chemical nouns or noun phrases. A tool designed to extract chemical terms and related verbs for chemical nouns and noun phrases are abstracted from the paragraphs describing the processes, which yields the chemical knowledge domain keywords to create a corpus. Has started to show the graph-based modeling potential for the visualization of chemical process knowledge, using the machine learning approach to help effectively and Processes 2021, 9, 1342 accurately generate the graph-based knowledge representation for chemical-related patents needs much more research and development. Graphs are comparable, which enables researchers to discuss the similarity and differences of patents

Literature Review
Chemical Text Mining
Paragraph Classification
NetworkX
Spring Layout
Methodology
Methodology knowledge graph generator are shown
Document Preprocess
Chemical Paragraph Recognition Model
Graph Visualization
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.