Abstract

Recently, for a certain circle of analytical processes there is a tendency of intellectualization of software components that implement these processes. It also means the accumulation of knowledge about the functioning of the analytical system, including knowledge of the analyst's actions. Accumulation of knowledge allows the software to independently classify new data and offer the user the most appropriate steps of the scenario for analytical activities.An analytical activity scenario has considered as a certain representation of knowledge, used to describe the sequence of related events — in the form of Directed Acyclic Graph. The article proposes an approach to solving the problem of intellectualization of the process of forming a scenario of analytical activity, based on the development of methods of machine learning, namely Classification and Regression Trees. This approach using a combination of metrics for evaluation of the effectiveness has been applied.The authors have proposed an own version of the intellectualization software, that implement of the Classification and Regression Trees method on Python programming language. This version differs from the known, the possibility of using different metrics in analyzing the quality of the partition and through it the choice of the next step of the probable actions of analytical scenarios. Unlike existing approaches, the authors have offered the choice of the most optimal metric for assessing the quality of approximation to the desired learning result — the Gini coefficient or the method of calculating the entropy of utility information by Shannon.The first step in constructing a scenario is a description of the matrix of all possible states of the oriented graph, which reflects the sequence of user actions to achieve the goal.Then, there have been computed the inhomogeneities input data, which contains a matrix of possible scenario actions. The measure of heterogeneity is entropy information by Shannon. To improve the quality of the partition, use the Gini coefficient. The best decomposition criterion for «True» or «False» has been calculated for constructing a decision tree. Based on the decision tree, the program offers the user the most appropriate next steps. The algorithm has been supplemented by a more convenient mechanism for forming the semantic conditions of the transition in the form of operators «if – then».The suggested approach allows you to reduce the number of user’s erroneous actions (especially inexperienced users) in the formation of complex scenarios with a variety of conditions for the use of data analysis operators.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.