In this paper, we explore the possibility to apply natural language processing in visual model-to-model (M2M) transformations. Therefore, we present our research results on information extraction from text labels in process models modeled using Business Process Modeling Notation (BPMN) and use case models depicted in Unified Modeling Language (UML) using the most recent developments in natural language processing (NLP). In this paper, we focus on three relevant tasks, namely, the extraction of verb/noun phrases that would be used to form relations, parsing of conjunctive/disjunctive statements, and the detection of abbreviations and acronyms. Relation extraction was attempted to solve by implementing techniques that combine state-of-the-art NLP language models with formal regular expressions grammar-based structure detection. In this paper, we perform thorough testing of the most recent state-of-the-art NLP tools (CoreNLP, Stanford Stanza, Flair, Spacy, AllenNLP, BERT, ELECTRA), as well as custom BERT-BiLSTM-CRF and ELMo-BiLSTM-CRF implementations, trained with certain data augmentations to improve performance on the most ambiguous cases; these tools are used as a foundation for building tools to extract noun and verb phrases from short text labels generally used in UML and BPMN models. Furthermore, we describe our attempts to improve these extractors by solving the abbreviation/acronym detection problem using machine learning-based detection, as well as process conjunctive and disjunctive statements, due to their relevance to performing advanced text normalization. The obtained results show that the best phrase extraction and conjunctive phrase processing performance was obtained using Stanza based implementation, yet, our trained BERT-BiLSTM-CRF outperformed it for the verb phrase detection task. Our acronym detection approach resulted in the precision of 0.78 and F1-Score of 0.73 which may also be considered quite positive. While this work was inspired by our ongoing research on partial model-to-model transformations, we believe it to be applicable in other areas requiring similar text processing capabilities as well.
Read full abstract