Abstract

This paper shows the added value of using the existing specific domain knowledge to generate new derivated variables to complement a target dataset and the benefits of including these new variables into further data analysis methods. The main contribution of the paper is to propose a methodology to generate these new variables as a part of preprocessing, under a double approach: creating 2nd generation knowledge-driven variables, catching the experts criteria used for reasoning on the field or 3rd generation data-driven indicators, these created by clustering original variables. And Data Mining and Artificial Intelligence techniques like Clustering or Traffic light Panels help to obtain successful results. Some results of the project INSESS-COVID19 are presented, basic descriptive analysis gives simple results that even though they are useful to support basic policy-making, especially in health, a much richer global perspective is acquired after including derivated variables. When 2nd generation variables are available and can be introduced in the method for creating 3rd generation data, added value is obtained from both basic analysis and building new data-driven indicators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call