Operational neural networks

Serkan Kiranyaz,Turker Ince,Moncef Gabbouj,Alexandros Iosifidis

doi:10.1007/s00521-020-04780-3

Serkan Kiranyaz, Turker Ince + Show 2 more

Open Access

https://doi.org/10.1007/s00521-020-04780-3

Copy DOI

Abstract

Feed-forward, fully connected artificial neural networks or the so-called multi-layer perceptrons are well-known universal approximators. However, their learning performance varies significantly depending on the function or the solution space that they attempt to approximate. This is mainly because of their homogenous configuration based solely on the linear neuron model. Therefore, while they learn very well those problems with a monotonous, relatively simple and linearly separable solution space, they may entirely fail to do so when the solution space is highly nonlinear and complex. Sharing the same linear neuron model with two additional constraints (local connections and weight sharing), this is also true for the conventional convolutional neural networks (CNNs) and it is, therefore, not surprising that in many challenging problems only the deep CNNs with a massive complexity and depth can achieve the required diversity and the learning performance. In order to address this drawback and also to accomplish a more generalized model over the convolutional neurons, this study proposes a novel network model, called operational neural networks (ONNs), which can be heterogeneous and encapsulate neurons with any set of operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. Finally, the training method to back-propagate the error through the operational layers of ONNs is formulated. Experimental results over highly challenging problems demonstrate the superior learning capabilities of ONNs even with few neurons and hidden layers.

Highlights

1.1 Problem formulationThe conventional fully connected and feed-forward neural networks, such as multi-layer perceptrons (MLPs) and radial basis functions (RBFs), are universal approximators
There have been some attempts in the literature to modify MLPs by changing the neuron model and/or conventional BP algorithm [18,19,20], or the parameter updates [21, 22]; their performance improvements were not significant in general, since such approaches still inherit the main drawback of MLPs, i.e., homogenous network configuration with the same neuron model
The operational neural networks (ONNs) proposed in this study are inspired from two basic facts: (1) Bioneurological systems including the mammalian visual system are based on heterogeneous, nonlinear neurons with varying synaptic connections, and (2) the corresponding heterogeneous ANN models encapsulating nonlinear neurons have recently demonstrated such a superior learning performance that cannot be achieved by their conventional linear counterparts (e.g., MLPs) unless significantly deeper and more complex configurations are used [25,26,27,28]

Summary

Introduction

1.1 Problem formulationThe conventional fully connected and feed-forward neural networks, such as multi-layer perceptrons (MLPs) and radial basis functions (RBFs), are universal approximators. While there is recently a lot of activity in searching for good network architectures based on the data at hand, either progressively [4, 5] or by following extremely laborious search strategies [6,7,8,9,10], the resulting network architectures may still exhibit a varying or entirely unsatisfactory performance levels, especially when facing with highly complex and nonlinear problems This is mainly due to the fact that all such traditional neural networks employ a homogenous network structure consisting of only a crude model of the biological neurons. Extensions of the MLP networks for end-to-end learning of 2D (visual) signals, i.e., convolutional neural networks (CNNs), and time-series data, i.e., recurrent neural networks (RNNs) and long shortterm memories (LSTMs), naturally inherit the same limitations originating from the traditional neuron model

Objectives

Methods

Results

Conclusion