Procedures for Reduced-Rank Regression

P T Davies,M K-S Tso

doi:10.2307/2347998

Abstract

SUMMARY We discuss in this paper procedures for the analysis of the reduced-rank regression model. A new method is proposed for parameter estimation which is justified by a least- squares analysis employing matrix singular-value decomposition and the Eckart-Young theorem. The application of the model is illustrated by the regression analysis of gasoline distillation measurements on composition data obtained by gas-liquid chromatography. IN the study of the experimental properties of mixtures, a linear model is often proposed to relate response to composition. The statistical technique of linear regression analysis is then appropriate and it is often applied severally when there are a number of responses of interest. Now the responses are often inter-related, so that, for instance, it may be possible to use an empirical linear relationship to predict the approximate value of a certain response from a knowledge of the others. The procedure for determining the regression coefficients of response on composition should in these circumstances be modified to reflect the known presence of such relationships (whose linearity is implied by the mutual linear dependence of responses on composition). This leads to consideration of the multivariate regression model with a constraint imposed on the rank of the matrix of coefficients, sometimes termed reduced-rank regression. Such models have been studied, e.g. by Izenman (1975) and also by Burket (1964), who used a factor analysis model. We have applied reduced-rank regression to the investigation of hydrocarbon fuels, whose composition is routinely characterized by the technique of gas-liquid chromatography (g.l.c.). The responses are, in the main, engine test ratings (each having a nominal repeatability or associated precision), which are often known to be correlated or otherwise interrelated. They are, for instance, frequently known to be critically dependent on the same portion of the g.l.c. chromatogram. The precision of the g.l.c. determination is quite high and the resulting data vector may be regarded as deterministic. The test results on the other hand typically exhibit a high degree of scatter. Under these circumstances a multivariate regression model was considered appropriate. The rank restriction was employed to ascertain whether a relatively small number of specific features of the chromatogram could be used to predict fuel properties, and to identify these features. It is not unusual for chromatographic data on a hydrocarbon fuel to comprise a very large number of resolved peaks, typically 200-300. This number is usually much larger than the number of fuel samples available or economically justified for experimentation. A composition vector corresponding to a chromatogram with good resolution will contain a large number of components, many of which may be effectively zero. A reduction to a smaller number of components may be achieved by, say, bulking the

Full Text