Expressiveness of Shallow Networks

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In this chapter, it is proved that the set of multivariate functions generated by shallow networks is dense in the space of continuous functions on a compact set if and only if the activation function is not a polynomial. For the specific choice of the ReLU activation function, a two-sided estimate of the approximation rate of Lipschitz functions by shallow networks is also provided. The argument for the lower estimate makes use of an upper estimate on the VC-dimension of shallow ReLU networks.

Similar Papers
  • Research Article
  • Cite Count Icon 105
  • 10.1109/access.2019.2903582
End-to-End Image Super-Resolution via Deep and Shallow Convolutional Networks
  • Jan 1, 2019
  • IEEE Access
  • Yifan Wang + 3 more

In this paper, we propose a new image super-resolution (SR) approach based on a convolutional neural network (CNN), which jointly learns the feature extraction, upsampling, and high-resolution (HR) reconstruction modules, yielding a completely end-to-end trainable deep CNN. However, directly training such a deep network in an end-to-end fashion is challenging, which takes a longer time to converge and may lead to sub-optimal results. To address this issue, we propose to jointly train an ensemble of deep and shallow networks. The shallow network with weaker learning capability restores the main structure of the image content, while the deep network with stronger representation power captures the high-frequency details. Since the shallow network is much easier to optimize, it significantly lowers the difficulty of deep network optimization during joint training. To further ensure more accurate restoration of HR images, the high-frequency details are reconstructed in a multi-scale manner to simultaneously incorporate both short- and long-range contextual information. The proposed method is extensively evaluated on widely adopted data sets and compares favorably against state-of-the-art methods. In-depth ablation studies are conducted to verify the contributions of different network designs to image SR, providing additional insights for future research.

  • Conference Article
  • Cite Count Icon 58
  • 10.1109/ccwc.2018.8301755
Comparison of shallow and deep neural networks for network intrusion detection
  • Jan 1, 2018
  • Daniel E Kim + 1 more

The increasing complexity and malice of modern computer and network attacks drives a need and search for more adaptive and smarter intrusion detection methods. Neural networks can provide a useful, self-learning approach to threat detection for network intrusion. After testing a variety of simple shallow and deep neural networks on the well-known NSL-KDD dataset comprised of network traffic capture containing 148,000 observations and 41 features with 22 specific attacks, we confirm the findings of previous researchers [15] that shallow neural networks are better suited for network intrusion detection than deep neural networks. Shallow networks were able to more accurately classify network data and produced lower error rates compared to deep networks.

  • Conference Article
  • Cite Count Icon 25
  • 10.1109/icacci.2017.8126143
Secure shell (ssh) traffic analysis with flow based features using shallow and deep networks
  • Sep 1, 2017
  • R Vinayakumar + 2 more

The primary objective of this work is to evaluate the effectiveness of various shallow and deep networks for characterizing and classifying the encrypted traffic such as secure shell (SSH). The SSH traffic statistical feature sets are estimated from various private and public traces. Private trace is NIMS (Network Information Management and Security Group) and public traces are MAWI (Measurement and Analysis on the WIDE Internet), NLANR's (National Laboratory for Applied Network Research) Active Measurement Project (AMP). To select optimal deep networks, experiments are done for various network parameters, network structures and network topologies. All the experiments are run up to 1000 epochs with learning rate in the range [0.01-0.5]. The various shallow and deep networks are trained using public traces and evaluated on the private trace and vice-versa. Results indicate that there is a possibility to detect SSH traffic with acceptable detection rate. The deep network has performed well in comparison to the shallow networks. Moreover, the performance of various shallow networks is comparable.

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.neunet.2019.11.006
Dimension independent bounds for general shallow networks
  • Nov 22, 2019
  • Neural Networks
  • H.N Mhaskar

Dimension independent bounds for general shallow networks

  • Research Article
  • 10.1108/jimse-02-2025-0002
DSF-Net: semantic segmentation of large-scale point clouds based on integrating deep and shallow networks
  • May 13, 2025
  • Journal of Intelligent Manufacturing and Special Equipment
  • Gang Xiao + 6 more

Purpose With the upgrading of three-dimensional (3D) sensing devices, the amount of point cloud data collected has also increased exponentially. However, most of the existing methods also have unbalanced optimizations in memory consumption and semantic segmentation efficiency. This research addresses the need for a more balanced approach in processing large-scale point cloud data efficiently. Design/methodology/approach This research used a network framework (DSF-Net) based on dual-path deep and shallow networks and designed a point cloud space pyramid pooling module based on hole convolution. The 3D point cloud data are trained separately by integrating the deep branch and shallow branch networks. Besides, a deep and shallow fusion module fuses the deep and shallow feature relationships and outputs several loss functions for convergence training. Findings It is found that DSF-Net solves the problem of segmentation efficiency, achieves a balanced effect while ensuring the ability of a large range of point cloud input and reduces the memory consumption. Originality/value The deep network can extract high-level semantic information, while the shallow neural network has fewer neural network layers and faster inference speed. Meanwhile, random sampling and point-atrous spatial pyramid pool modules are used, respectively, for deep and shallow networks to capture multi-scale local context information in point cloud.

  • Research Article
  • Cite Count Icon 7
  • 10.1007/s40747-024-01594-x
SDGSA: a lightweight shallow dual-group symmetric attention network for micro-expression recognition
  • Aug 14, 2024
  • Complex & Intelligent Systems
  • Zhengyang Yu + 2 more

Recognizing micro-expressions (MEs) as subtle and transient forms of human emotional expressions is critical for accurately judging human feelings. However, recognizing MEs is challenging due to their transient and low-intensity characteristics. This study develops a lightweight shallow dual-group symmetric attention network (SDGSA) to address the limitations of existing methods in capturing the subtle features of MEs. This network takes the optical flow features as inputs, extracting ME features through a shallow network and performing finer feature segmentation in the channel dimension through a dual-group strategy. The goal is to focus on different types of facial information without disrupting facial symmetry. Moreover, this study implements a spatial symmetry attention module, focusing on extracting facial symmetry features to emphasize further the symmetric information of the left and right sides of the face. Additionally, we introduce the channel blending technique to optimize the information fusion between different channel features. Extensive experiments on SMIC, CASME II, SAMM, and 3DB-combined mainstream ME datasets demonstrate that the proposed SDGSA method outperforms the metrics of current state-of-the-art methods. As shown by ablation experimental results, the proposed dual-group symmetric attention module outperforms classical attention modules, such as the convolutional block attention module, squeeze-and-excitation, efficient channel attention, spatial group-wise enhancement, and multi-head self-attention. Importantly, SDGSA maintained excellent performance while having only 0.278 million parameters. The code and model are publicly available at https://github.com/YZY980123/SDGSA.

  • Research Article
  • 10.3397/in_2023_0511
A comparison of the classification performance of shallow and deep convolutional neural networks in small active sonar datasets
  • Nov 30, 2023
  • INTER-NOISE and NOISE-CON Congress and Conference Proceedings
  • Geunhwan Kim + 7 more

An active sonar system transmits and receives a pulse with a short duration to detect and track underwater targets. The detection shrinks candidates for the targets with multiple contacts; some of them are the actual targets. Classification is conducted to find the targets among the contacts. Previous classification studies have used conventional machine learning techniques including support vector machines and displayed limited performance. Due to the recent remarkable development of deep learning using deep neural networks, it is being introduced to active sonar classification. However, superior performance of deep learning is guaranteed when big data are available. In the active sonar classification, deep learning-based classification performance may deteriorate due to the small active sonar dataset, and shallow networks could be alternative. Here, we compare and analyze the classification performance of shallow and deep convolutional neural networks using in-situ active sonar datasets.

  • Research Article
  • Cite Count Icon 39
  • 10.1016/j.neunet.2017.04.003
Probabilistic lower bounds for approximation by shallow perceptron networks
  • Apr 19, 2017
  • Neural Networks
  • Věra Kůrková + 1 more

Probabilistic lower bounds for approximation by shallow perceptron networks

  • Research Article
  • Cite Count Icon 21
  • 10.1109/tnnls.2023.3259016
Class-Incremental Learning Method With Fast Update and High Retainability Based on Broad Learning System.
  • Aug 1, 2024
  • IEEE transactions on neural networks and learning systems
  • Jie Du + 5 more

Machine learning aims to generate a predictive model from a training dataset of a fixed number of known classes. However, many real-world applications (such as health monitoring and elderly care) are data streams in which new data arrive continually in a short time. Such new data may even belong to previously unknown classes. Hence, class-incremental learning (CIL) is necessary, which incrementally and rapidly updates an existing model with the data of new classes while retaining the existing knowledge of old classes. However, most current CIL methods are designed based on deep models that require a computationally expensive training and update process. In addition, deep learning based CIL (DCIL) methods typically employ stochastic gradient descent (SGD) as an optimizer that forgets the old knowledge to a certain extent. In this article, a broad learning system-based CIL (BLS-CIL) method with fast update and high retainability of old class knowledge is proposed. Traditional BLS is a fast and effective shallow neural network, but it does not work well on CIL tasks. However, our proposed BLS-CIL can overcome these issues and provide the following: 1) high accuracy due to our novel class-correlation loss function that considers the correlations between old and new classes; 2) significantly short training/update time due to the newly derived closed-form solution for our class-correlation loss without iterative optimization; and 3) high retainability of old class knowledge due to our newly derived recursive update rule for CIL (RULL) that does not replay the exemplars of all old classes, as contrasted to the exemplars-replaying methods with the SGD optimizer. The proposed BLS-CIL has been evaluated over 12 real-world datasets, including seven tabular/numerical datasets and six image datasets, and the compared methods include one shallow network and seven classical or state-of-the-art DCIL methods. Experimental results show that our BIL-CIL can significantly improve the classification performance over a shallow network by a large margin (8.80%-48.42%). It also achieves comparable or even higher accuracy than DCIL methods, but greatly reduces the training time from hours to minutes and the update time from minutes to seconds.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/dasc/picom/datacom/cyberscitec.2018.00019
Target Detection Based on Cascade Network and Densely Connected Network in Remote Sensing Image
  • Aug 1, 2018
  • Jihao Wang + 3 more

In order to solve the problem of low detection efficiency of remote sensing image using convolutional neural network. a faster detection algorithm based on the three levels cascade convolutional densely connected network model is established to target detection in high resolution remote sensing images. Firstly, the shallow full convolutional network is used as the first layer of cascaded network. The shallow layer network runs fast, and it can quickly filter the incorrect window in the remote sensing image. The full convolutional network can build the sliding window into the network and improve the efficiency. Secondly, the shallow network is used as the second layer network to filter the windows again after the first layer network detected. Thirdly, the third layer network uses a small densely connected network (DenseNet). Densely connected network have fewer parameters and are more accurate. It can identify the reserved windows of second layer network. The data used in the test are No.2 High resolution satellite remote sensing images. The experiment shows that for the same size picture, in the case of little difference in the correct rate, the computation time of convolutional neural network is about 1.3-1.8 times of the computing time of cascaded network.

  • Research Article
  • Cite Count Icon 1
  • 10.1088/1742-6596/1237/3/032025
Video Logging Casing Damage Image Recognition Based on Improved Convolutional Neural Network
  • Jun 1, 2019
  • Journal of Physics: Conference Series
  • Hongtao Hu + 1 more

Oil casing damage detection is the key point to ensure the smooth production of oil fields. In recent years, the automatic image recognition technology based on deep learning has become a researchful hot topic. But the common deep learning models have some defects in identifying the target features of casing damage images in the complex environment. This paper proposes an oil casing damage image recognition model based on DS-CNN(deep and shallow convolutional neural network). Based on VGG19, this model integrates the shallow convolution neural network. It combines global features extracted by the shallow network and the local features extracted by the deep network to form the input of the fully connected layer. The joint training of the shallow network and the deep network enables the image to be expressed in multiple scales to improve the recognition accuracy of the entire model. The experimental data is obtained from the downhole casing image dataset of an oil field in Sichuan. Experimental result shows that the macro-average F1 scores of the DS-CNN are 4.41 and 5.74 percentage points higher than those of the VGG19 model and the GoogleNet model, indicating that this model improves the recognition accuracy of oil casing damage images.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/i2ct45611.2019.9033955
Energy Disaggregation for NILM applications using Shallow and Deep Networks
  • Mar 1, 2019
  • Lakshmi Nambiar + 2 more

Comparative study is conducted by applying deep and shallow network on energy data for non-intrusive load monitoring (NILM) applications. From the total energy consumption, various appliance consumption can be disaggregated using shallow and deep networks. The study highlights the effective method for energy disaggregation from the total energy consumption using algorithms rather than the expensive method of separately metering individual loads. Shallow algorithm support vector regression and deep learning algorithms such as deep neural network (DNN) and long short term memory (LSTM) are used in this study and their performance is evaluated.

  • Book Chapter
  • 10.1007/978-981-16-1194-0_15
NCLRNet: A Shallow Network and Non-convex Low-Rank Based Fabric Defect Detection
  • Jan 1, 2021
  • Ban Jiang + 4 more

Fabric images have complex and regular texture features, and defects destroy this regularity, which can be considered as sparse parts in background. Low rank representation technique has been proven applicable in fabric defect detection, which decomposes fabric image into sparse parts and redundant background. Traditional low-rank representation model is resolved by convex surrogate, which results in an inaccurate solution. In addition, the performance of low-rank representation model relies on the characterization capabilities of feature descriptor. But the hand-crafted features cannot effectively describe the complex fabric texture. To solve these issues, we propose a fabric defect detection algorithm based on a shallow network and Non-convex Low rank representation (NCLRNet). In this process, we design a shallow convolutional neural network to improve the efficiency of feature extraction, and the non-convex method is introduced into the low rank representation model to get the accurate solution. Moreover, the detection results of different feature layers are fused together by the double low rank matrix representation algorithm to achieve a better detection performance. Experimental results on fabric images demonstrate the effectiveness and robustness of our proposed method.

  • Conference Article
  • Cite Count Icon 30
  • 10.1109/icassp49357.2023.10096837
Learning From Yourself: A Self-Distillation Method For Fake Speech Detection
  • Jun 4, 2023
  • Jun Xue + 6 more

In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can not capture this very well. To address this problem, we propose using the deepest network instruct shallow network for enhancing shallow networks. Specifically, the networks of FSD are divided into several segments, the deepest network being used as the teacher model, and all shallow networks become multiple student models by adding classifiers. Meanwhile, the distillation path between the deepest network feature and shallow network features is used to reduce the feature difference. A series of experimental results on the ASVspoof 2019 LA and PA datasets show the effectiveness of the proposed method, with significant improvements compared to the baseline.

  • Research Article
  • Cite Count Icon 43
  • 10.1016/j.jcp.2023.112084
Greedy training algorithms for neural networks and applications to PDEs
  • Mar 23, 2023
  • Journal of Computational Physics
  • Jonathan W Siegel + 4 more

Greedy training algorithms for neural networks and applications to PDEs

Save Icon
Up Arrow
Open/Close