Non-symmetric over-time pooling using pseudo-grouping functions for convolutional neural networks

Mikel Ferrero-Jaurrieta,Anderson Cruz,Zdenko Takáč,Laura De Miguel,Humberto Bustince,Rui Paiva,Benjamín Bedregal,Carlos Lopez-Molina

doi:10.1016/j.engappai.2024.108470

Abstract

Convolutional Neural Networks (CNNs) are a family of networks that have become state-of-the-art in several fields of artificial intelligence due to their ability to extract spatial features. In the context of natural language processing, they can be used to build text classification models based on textual features between words. These networks fuse local features to generate global features in their over-time pooling layers. These layers have been traditionally built using the maximum function or other symmetric functions such as the arithmetic mean. It is important to note that the order of input local features is significant (i.e. the symmetry is not an inherent characteristic of the model). While this characteristic is appropriate for image-oriented CNNs, where symmetry might make the network robust to image rigid transformations, it seems counter-productive for text processing, where the order of the words is certainly important. Our proposal is, hence, to use non-symmetric pooling operators to replace the maximum or average functions. Specifically, we propose to perform over-time pooling using pseudo-grouping functions, a family of non-symmetric aggregation operators that generalize the maximum function. We present a construction method for pseudo-grouping functions and apply different examples of this family to over-time pooling layers in text-oriented CNNs. Our proposal is tested on seven different models and six different datasets in the context of engineering applications, e.g. text classification. The results show an overall improvement of the models when using non-symmetric pseudo-grouping functions over the traditional pooling function.

Full Text