Layer Normalization for TSK Fuzzy System Optimization in Regression Problems

Yuqi Cui,Yifan Xu,Dongrui Wu,Ruimin Peng

doi:10.1109/tfuzz.2022.3185464

Abstract

Recently, mini-batch gradient descent (MBGD)-based optimization has become popular in Takagi–Sugeno–Kang (TSK) fuzzy system optimization. However, it suffers from some challenges, including the curse of dimensionality and the sensitivity to the choice of the optimizer. The former has been alleviated by our previously proposed high-dimensional TSK (HTSK) algorithm. In this article, we point out that the latter is caused by the gradient vanishing problem on the rule consequent parameters, which in turn is caused by the small magnitude of the normalized rule firing levels, especially when the number of rules is large. Thus, the rule consequents are easily trapped into a bad local minimum with poor generalization performance. We propose to use first layer normalization (LN) to amplify the small firing levels, and then rectified linear unit (ReLU) to discard rules far away from the current training sample. We evaluated our proposed HTSK-LN and HTSK-LN-ReLU on twelve regression datasets with various sizes and dimensionalities. Experiments demonstrated that they can significantly improve the generalization performance, regardless of the training set size, feature dimensionality, choice of the optimizer, and rulebase size.

Full Text