Abstract

Introduction In most cases, automatic processing can only be conducted on a structured and normalized data set. If the data set is unstructured, then most of data processing algorithms will fail to provide adequate results. The selection of a formalized model for the data is a process that is conducted manually. The main reason behind using manual approach is that even the state-of-the-art text mining algorithms are incapable of processing complex semantics of non-formalized and unstructured textual data. On the other hand, important patterns can be extracted only from the structured data. There are varieties of methods that provide basic data cleansing and structuring services: clusterization algorithms, normalization techniques, data cleansing, etc. A general automatic formalized model construction can be realized for more complex cases. This becomes possible due to high abstraction power of graph theory. However, the graph theory is to be used carefully--trying to apply this theory without considering the research field features can lead to models that lack needed level of details for further processing. Common approaches to the processing of connected and poorly structured data are based on graphs theory. Poorly formalized and unstructured data is one of the largest segments in the data processing. Experts agree that almost 80 to 85 percent of business-relevant information originates in unstructured form. The manual processing of unstructured data is costly and time-consuming. To interpret unstructured information such techniques as natural language processing (NLP), data mining, text analytics are used. Further, the patterns found can be organized and structured using mathematical graphs. Different kinds of graphs can be used to represent different features of data. The field of innovation management was selected to apply the results of the research, because the development of this novel approach to data structuration is a part of work on universities' innovation life cycle model (ILCM) and innovation management system. The basic components of innovation management are ideas and innovations. For example, the idea can be represented by unstructured textual descriptions and a group of illustrations. It is important to classify the idea based on its description. The manual classification becomes difficult in case of large number of ideas. The idea can be provided in the form of terms-rich description. Such description contains specific key words. There are many techniques to identify such words and phrases, however these approaches are leaving aside the problems of textual structure and semantics. There are other attributes that must be used in order to structure ideas. Considering all these attributes, the process of ideas management becomes fairly complicated, thus automatic tools become highly important for efficient innovations management. Back in 1986, Andrew H. Van de Ven has outlined four central problems in innovation management (Van de Ven, 1986): managing attention; managing ideas into good currency; managing part-whole relationships; institutional leadership. According to Van de Ven's 2004 paper, these cornerstone problems remain intact and continue to heavily influence the performance of innovation management process (Van de Ven and Engleman, 2004). In our work we will concentrate on ideas and innovations automatic structuration tasks which are in the scope of outlined problem of managing ideas into good currency. One of the major parts of this problem is to correctly define the most promising and optimal (in respect to given resources) idea. One of the best ways to estimate an idea would be to compare it with implemented analogues. The estimation can become rather complicated when considering ideas interconnections and dependencies on each other or on specific innovation. In the second section of the paper we will provide a literature review of automatic and semi-automatic graph-based decision-support approaches dealing with the processing of poorly formalized data as well as a basic review of innovation management literature. …

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call