Abstract
Sparse Bayesian Extreme Learning Machine (SBELM) constructs an extremely sparse and probabilistic model with low computational cost and high generalization. However, the update rule of hyperparameters (ARD prior) in SBELM involves using the diagonal elements from the inversion of the covariance matrix with the full training dataset, which raises the following two issues. Firstly, inverting the Hessian matrix may suffer ill-conditioning issues in some cases, which hinders SBELM from converging. Secondly, it may result in the memory-overflow issue with computational memory O(L <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ) ( L: number of hidden nodes) to invert the big covariance matrix for updating the ARD priors. To address these issues, an inverse-free SBELM called QN-SBELM is proposed in this paper, which integrates the gradient-based Quasi-Newton (QN) method into SBELM to approximate the inverse covariance matrix. It takes O(L <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) computational complexity and is simultaneously scalable to large problems. QN-SBELM was evaluated on benchmark datasets of different sizes. Experimental results verify that QN-SBELM achieves more accurate results than SBELM with a sparser model, and also provides more stable solutions and a great extension to large-scale problems.
Highlights
Extreme learning machines (ELMs) are a kind of random projection-based neural network [1] [2]
In QN-Sparse Bayesian Extreme Learning Machine (SBELM), the approximated Φ−1 can be employed to formula (8) to update the automatic relevance determination (ARD) prior and no full training data loaded into memory beforehand required
An inverse-free SBELM called QN-SBELM was introduced in this paper, which integrates a robust solver with Quasi-newton (QN) method into SBELM
Summary
Extreme learning machines (ELMs) are a kind of random projection-based neural network [1] [2]. One of the significant advantages of SBELM over ELM is that a lot of hidden neurons can be pruned during the learning phase by setting the corresponding weights to zero, resulting in an extremely sparse network and speedy execution time. Based on the ARD theory, many αk is tuned to infinity during iteration such that their associated wk becomes zero With this mechanism, most randomly generated hidden neurons and the connected input weights are pruned.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.