Iteratively Reweighted Least Squares Research Articles

Mixture of experts (ME) is a modular neural network architecture for supervised learning. A double-loop Expectation-Maximization (EM) algorithm has been introduced to the ME architecture for adjusting the parameters and the iteratively reweighted least squares (IRLS) algorithm is used to perform maximization in the inner loop [Jordan, M.I., Jacobs, R.A. (1994). Hierarchical mixture of experts and the EM algorithm, Neural Computation, 6(2), 181–214]. However, it is reported in literature that the IRLS algorithm is of instability and the ME architecture trained by the EM algorithm, where IRLS algorithm is used in the inner loop, often produces the poor performance in multiclass classification. In this paper, the reason of this instability is explored. We find out that due to an implicitly imposed incorrect assumption on parameter independence in multiclass classification, an incomplete Hessian matrix is used in that IRLS algorithm. Based on this finding, we apply the Newton–Raphson method to the inner loop of the EM algorithm in the case of multiclass classification, where the exact Hessian matrix is adopted. To tackle the expensive computation of the Hessian matrix and its inverse, we propose an approximation to the Newton–Raphson algorithm based on a so-called generalized Bernoulli density. The Newton–Raphson algorithm and its approximation have been applied to synthetic data, benchmark, and real-world multiclass classification tasks. For comparison, the IRLS algorithm and a quasi-Newton algorithm called BFGS have also been applied to the same tasks. Simulation results have shown that the use of the proposed learning algorithms avoids the instability problem and makes the ME architecture produce good performance in multiclass classification. In particular, our approximation algorithm leads to fast learning. In addition, the limitation of our approximation algorithm is also empirically investigated in this paper.

Read full abstract

Abstract The purpose of this article is to consider the use of the EM algorithm (Dempster, Laird, and Rubin 1977) for both maximum likelihood (ML) and restricted maximum likelihood (REML) estimation in a general repeated measures setting using a multivariate normal data model with linear mean and covariance structure (Anderson 1973). Several models and methods of analysis have been proposed in recent years for repeated measures data; Ware (1985) presented an overview. Because the EM algorithm is a general-purpose, iterative method for computing ML estimates with incomplete data, it has often been used in this particular setting (Dempster et al. 1977; Andrade and Helms 1984; Jennrich and Schluchter 1985). There are two apparently different approaches to using the EM algorithm in this setting. In one application, each experimental unit is observed under a standard protocol specifying measurements at each of n occasions (or under n conditions), and incompleteness implies that the number of measurements actually collected on each unit is less than the requisite n for at least some units. In this circumstance, incompleteness may be modeled if one regards the measurements actually collected as the observed data, the conceptual set of n measurements on each individual as the complete data, and the unobserved data as the missing measurements on those units with fewer than n observations. Application of the EM algorithm in this setting [referred to as “missing data” in Dempster et al. (1977) and “incomplete data” in Jennrich and Schluchter (1985)] was discussed by Orchard and Woodbury (1972), Beale and Little (1975), and Jennrich and Schluchter (1985). One drawback of this approach in the longitudinal data setting is that the multivariate model with linear mean and covariance structure does not, in general, possess closed-form solutions even with complete data (Anderson 1973; Szatrowski 1980). Thus implementing the EM algorithm requires either an iterative M step within each EM iteration or the use of a generalized EM (GEM) algorithm that requires only that the complete data likelihood be increased rather than maximized at each M step. A second drawback is that this approach requires specification of the covariates for both the observed and the missing observations. If the covariates are unknown for the missing observations, arbitrary values must be specified, which may affect the rate but not the final point of convergence (Jennrich and Schluchter 1985). The second application of the EM algorithm arises naturally when we use mixed models to analyze serial measurements. In this setting, the incomplete data are modeled quite differently. The observed data are as before, that is, the measurements actually collected on each unit. The complete data, however, consist of the observed data plus the unobservable random parameters and error terms specified in the mixed model. Thus the missing data (the random parameters and error terms) would not be viewed as data in the traditional statistical sense. Laird and Ware (1982) and Andrade and Helms (1984) took this approach. This article shows that the latter approach is more general and encompasses the missing-data approach as a special case. This result has several important applications. First, it means that EM algorithms encoded for models with random effects can also be used for multivariate normal models with arbitrary covariance structure and missing data. Second, this approach avoids specification of covariates for missing observations. Finally, use of the general formulation means that closed-form solutions for the complete data maximization will exist for a much broader class of models, enabling one to avoid use of GEM or iterations within each M step. For a certain class of multivariate growth curve models with random effects structure (Reinsel 1982), closed-form solutions exist for both ML and REML estimates of the mean and covariance parameters. Formulas for these closed-form solutions are given that are applicable whenever the solution is not on the boundary. The choice of starting values for the EM iterations is important, since the EM algorithm will not, in general, converge from arbitrary starting values to the closed-form solution (if it exists) in one iteration. Several possibilities for starting values are given. The rate of convergence of the EM algorithm is generally linear. The actual speed of convergence in two data examples is shown to depend heavily on both the actual data set and the assumed structure for the covariance matrix. We discuss two methods for accelerating convergence, which we find are most useful when the covariance matrix is assumed to have a random effects structure. When the covariance matrix is assumed to be arbitrary, the EM iterations reduce to familiar iteratively reweighted least squares (IRLS) computations. The EM algorithm has the unusual property in this setting that when all of the data are complete (no missing observations), the iterations are still IRLS, but the rate of convergence changes from linear to quadratic.

Read full abstract

Iteratively Reweighted Least Squares Research Articles

Related Topics

Articles published on Iteratively Reweighted Least Squares

Improved learning algorithms for mixture of experts in multiclass classification

Investigation on the relationship between analytical precision and concentration with iteratively reweighted least-squares linear regression method

Efficient and fast iterative reweighted least-squaresnonrecursive filters

Iteratively reweighted least-squares state estimation through Givens Rotations

Nonlinear Regression, Quasi Likelihood, and Overdispersion in Generalized Linear Models

Non-linear inversion using general measures of data misfit and model structure

Hybrid ℓ1/ℓ2 minimization with applications to tomography

Letter: Doppler optimised compression filters for both aperiodical and periodical sequences

Estimation of Residual Valve Gradient from Incomplete Data with Outliers

Least absolute deviation (LAD) image matching

NO x emission modelling using the iteratively reweighted least-square procedures

A Globally Convergent Method for $l_p $ Problems

Non-oscillatory and Non-diffusive Solution of Convection Problems by the Iteratively Reweighted Least-Squares Finite Element Method

Load forecasting via suboptimal seasonal autoregressive models and iteratively reweighted least squares estimation

Design of anSFunction for Robust Regression Using Iteratively Reweighted Least Squares

L/sub 1/ and L/sub infinity / minimization via a variant of Karmarkar's algorithm

Fast Ip solution of large, sparse, linear systems: Application to seismic travel time tomography

On the connection between IRLS and Gauss' method for l&lt;inf&gt;1&lt;/inf&gt;inversion: Comments on "Fast algorithms for l&lt;inf&gt;p&lt;/inf&gt;deconvolution"

Maximum Likelihood Computations with Repeated Measures: Application of the EM Algorithm

L&lt;inf&gt;1&lt;/inf&gt;deconvolution and its application to seismic signal processing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Iteratively Reweighted Least Squares Research Articles

Related Topics

Articles published on Iteratively Reweighted Least Squares

Improved learning algorithms for mixture of experts in multiclass classification

Investigation on the relationship between analytical precision and concentration with iteratively reweighted least-squares linear regression method

Efficient and fast iterative reweighted least-squaresnonrecursive filters

Iteratively reweighted least-squares state estimation through Givens Rotations

Nonlinear Regression, Quasi Likelihood, and Overdispersion in Generalized Linear Models

Non-linear inversion using general measures of data misfit and model structure

Hybrid ℓ1/ℓ2 minimization with applications to tomography

Letter: Doppler optimised compression filters for both aperiodical and periodical sequences

Estimation of Residual Valve Gradient from Incomplete Data with Outliers

Least absolute deviation (LAD) image matching

NO x emission modelling using the iteratively reweighted least-square procedures

A Globally Convergent Method for $l_p $ Problems

Non-oscillatory and Non-diffusive Solution of Convection Problems by the Iteratively Reweighted Least-Squares Finite Element Method

Load forecasting via suboptimal seasonal autoregressive models and iteratively reweighted least squares estimation

Design of anSFunction for Robust Regression Using Iteratively Reweighted Least Squares

L/sub 1/ and L/sub infinity / minimization via a variant of Karmarkar's algorithm

Fast Ip solution of large, sparse, linear systems: Application to seismic travel time tomography

On the connection between IRLS and Gauss' method for l&amp;lt;inf&amp;gt;1&amp;lt;/inf&amp;gt;inversion: Comments on "Fast algorithms for l&amp;lt;inf&amp;gt;p&amp;lt;/inf&amp;gt;deconvolution"

Maximum Likelihood Computations with Repeated Measures: Application of the EM Algorithm

L&amp;lt;inf&amp;gt;1&amp;lt;/inf&amp;gt;deconvolution and its application to seismic signal processing

On the connection between IRLS and Gauss' method for l<inf>1</inf>inversion: Comments on "Fast algorithms for l<inf>p</inf>deconvolution"

L<inf>1</inf>deconvolution and its application to seismic signal processing