Abstract

Digital twins are expected to play a pivotal role in digital transformation. Although process informatics has attracted much attention, physical models are essential to realizing the digital twins. However, building a physical model of an industrial process takes much toil. We aim to facilitate the physical model building by developing an automated physical model building AI, named AutoPMoB, which performs five tasks: 1) retrieving documents about a target process from literature databases, 2) converting the format of each document to HTML format, 3) extracting information required for building a physical model from the documents, such as variables, equations, and experimental data, 4) judging the equivalence of the information extracted from different documents, and 5) reorganizing the information to output a desired physical model. This study focuses on task 4, especially judging the equivalence of variable definitions, i.e., whether two noun phrases represent the same variable. We created a large-scale corpus consisting of papers on chemical engineering, and built ProcessBERT, which is a domain-specific language model pre-trained on the corpus. We proposed a method for judging the equivalence of variable definitions based on ProcessBERT. When judging the equivalence, our proposed method first uses ProcessBERT to obtain the embeddings of the variable definitions. Then, the method calculates the cosine similarity between the embeddings. The method judges that the two definitions are equivalent when the similarity is larger than a threshold. Our proposed method judged the equivalence with higher accuracy than the method based on original BERT and SciBERT.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call