Convolutional neural networks (CNN) have recently seen tremendous success in various computer vision tasks. However, their application to problems with high dimensional input and output, such as high-resolution image and video segmentation or 3D medical imaging, has been limited by various factors. Primarily, in the training stage, it is necessary to store network activations for back-propagation. In these settings, the memory requirements associated with storing activations can exceed what is feasible with current hardware, especially for problems in 3D. Motivated by the propagation of signals over physical networks, that are governed by the hyperbolic Telegraph equation, in this work we introduce a fully conservative hyperbolic network for problems with high-dimensional input and output. We introduce a coarsening operation that allows completely reversible CNNs by using a learnable discrete wavelet transform and its inverse to both coarsen and interpolate the network state and change the number of channels. We show that fully reversible networks are able to achieve results comparable to the state of the art in 4D time-lapse hyper-spectral image segmentation and full 3D video segmentation, with a much lower memory footprint that is a constant independent of the network depth. We also extend the use of such networks to variational auto-encoders, where optimization begins from an exact recovery and we discover the level of compression through optimization.
Read full abstract