Abstract
Multi-set multivariate data analysis methods provide a way to analyze a series of tables together. In particular, the STATIS-dual method is applied in data tables where individuals can vary from one table to another, but the variables that are analyzed remain fixed. However, when you have a large number of variables or indicators, interpretation through traditional multiple-set methods is complex. For this reason, in this paper, a new methodology is proposed, which we have called Sparse STATIS-dual. This implements the elastic net penalty technique which seeks to retain the most important variables of the model and obtain more precise and interpretable results. As a complement to the new methodology and to materialize its application to data tables with fixed variables, a package is created in the R programming language, under the name Sparse STATIS-dual. Finally, an application to real data is presented and a comparison of results is made between the STATIS-dual and the Sparse STATIS-dual. The proposed method improves the informative capacity of the data and offers more easily interpretable solutions.
Highlights
Classic methods of multivariate analysis operate with two-way data [1], whose rows and columns collect, in a data matrix, the information provided by individuals and variables, respectively When this matrix is analyzed, all the variables are considered at the same time and, the information extracted represents a global vision of the system [2,3]
To expose the main aspects of the STATIS-dual and the Sparse STATIS-dual, and to recognize the usefulness of both methods in the analysis of three-way data, we used panel data (2016–2020) from the Global Innovation Index [26], which integrates 80 global innovation indicators in more than 130 economies. This index captures the multidimensional facets of innovation between countries, and supports the monitoring of innovation factors that allow the formulation of more effective public policies for society and the world economy
One of the most important areas of current research in multivariate data analysis focuses on the development of efficient techniques for the study of large data matrices [22,58,59]
Summary
On many occasions, experiments are designed in which the variables are examined at different moments in time, giving rise to the application of multivariate data analysis techniques in three modes [4,5] In this way, the organization of data in three ways is constituted by a first index to identify the individuals under study, a second index for the variables that are measured on said individuals, and a third index for the various situations (moments) in which the measurements are made [6]. The integration of a third way is to analyze the similarities and differences between the different situations through the configurations of the individuals and the relationships between the groups of variables Following this concept, Kiers [7] classifies the three-way data into three-way data and multiple-set data. He defines three-way data as a set of data corresponding to the observations of all objects in all variables and on all occasions, and data from multiple sets as observations on different sets of objects and/or variables at different times [8,9]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have