The use of different techniques and tools is a common practice to cover all stages in the development life-cycle of systems generating a significant number of work products. These artefacts are frequently encoded using diverse formats, and often require access through non-standard protocols and formats. In this context, Model-Based Systems Engineering (MBSE) emerges as a methodology to shift the paradigm of Systems Engineering practice from a document-oriented environment to a model-intensive environment. To achieve this major goal, a formalised application of modelling is employed throughout the life-cycle of systems to generate various system artefacts represented as models, such as requirements, logical models, and multi-physics models. However, the mere use of models does not guarantee one of the main challenges in the Systems Engineering discipline, namely, the reuse of system artefacts. Considering the fact that models are becoming the main type of system artefact, it is necessary to provide the capability to properly and efficiently represent and retrieve the generated models. In light of this, traditional information retrieval techniques have been widely studied to match existing software assets according to a set of capabilities or restrictions. However, there is much more at stake than the simple retrieval of models or even any piece of knowledge. An environment for model reuse must provide the proper mechanisms to (1) represent any piece of data, information, or knowledge under a common and shared data model, and (2) provide advanced retrieval mechanisms to elevate the meaning of information resources from text-based descriptions to concept-based ones. This need has led to novel methods using word embeddings and vector-based representations to semantically encode information. Such methods are applied to encode the information of physical models while preserving their underlying semantics. In this study, a text corpus from MATLAB Simulink models was preprocessed using Natural Language Processing (NLP) techniques and trained to generate word vector representations. Then, the presented method was validated using a testbed of MATLAB Simulink physical models in which verbalisations of models are transformed into vectors. The effectiveness of the proposed solution was assessed through a use case study. Evaluation of the results demonstrates a precision value of 0.925, a recall value of 0.865, and an F1 score of 0.884.
Read full abstract