Fast Neural Network Research Articles

Since its creation, the ImageNet-1k benchmark set has played a significant role as a benchmark for ascertaining the accuracy of different deep neural net (DNN) models on the image classification problem. Moreover, in recent years it has also served as the principal benchmark for assessing different approaches to DNN training. Finishing a 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires $10^{18}$1018 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish $3 \times 10^{17}$3×1017 single precision operations per second (according to the Nov 2018 Top 500 results). If we can make full use of the computing capability of the fastest supercomputer, we should be able to finish the training in several seconds. Over the last two years, researchers have focused on closing this significant performance gap through scaling DNN training to larger numbers of processors. Most successful approaches to scaling ImageNet training have used the synchronous mini-batch stochastic gradient descent (SGD). However, to scale synchronous SGD one must also increase the batch size used in each iteration. Thus, for many researchers, the focus on scaling DNN training has translated into a focus on developing training algorithms that enable increasing the batch size in data-parallel synchronous SGD without losing accuracy over a fixed number of epochs. In this paper, we investigate supercomputers’ capability of speeding up DNN training. Our approach is to use a large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on five neural networks: AlexNet, AlexNet-BN, GNMT, ResNet-50, and ResNet-50-v2 trained with large datasets while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from Goyal et al. [1] , our approach shows higher test accuracy on batch sizes that are larger than 16K. When we use the same baseline, our results are better than Goyal et al. for all the batch sizes (Fig. 20 ). Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe, Facebook's PyTorch, and Google's TensorFlow. The difference between this paper and the conference-version of our work [2] includes: (1) we implement our approach on Google's cloud Tensor Processing Unit (TPU) platform, which verifies our previous success on CPUs and GPUs. (2) we scale the batch size of ResNet-50-v2 to 32K and achieve 76.3 percent accuracy, which is better than the 75.3 percent accuracy achieved in our conference paper. (3) we apply our approach to Google's Neural Machine Translation (GNMT) application, which helps us to achieves 4× speedup on the cloud TPUs.

Rigorous and robust first principles-based Homogeneous Surface Diffusion Model (HSDM) is demonstrated for numerical simulation and estimation of surface diffusivities for single, binary and ternary systems involving dyes and pharmaceutical molecules. The current work's novelty lies in proposing a fast, reliable and efficient Artificial Neural Network (ANN) surrogate to the mechanistic HSDM. Repeated numerical integration of the model's partial differential equations during parameter estimation from batch adsorption kinetics data is highly time-consuming and is not required for the proposed approach. This ANN was trained by a small number of HSDM simulations and limited experimental batch kinetics data with different combinations of surface diffusivity (DS) values. ANNs were developed and tested against the experimentally obtained batch kinetics data for various systems. The trained ANN was able to capture the kinetics that was rigorously predicted using HSDM. A 99.9%, 98.6% and 99.3% similarity could be achieved between DS values estimated using HSDM and ANN for single, binary and ternary systems respectively. Similarly, the batch kinetics data was almost identically tracked by ANN. The computational time required for this novel ANN per simulation reduced spectacularly and was about 14 times lesser while the total parameter estimation time was about 17 times lesser than HSDM. The ANN developed for estimating parameters could be operated in reverse as well for simulating the multicomponent batch adsorption kinetics and tracking the increase in percentage removal of the solutes with time at different process conditions. Irrespective of number of components, the ANNs performances were consistent. The ratio of neurons and their total number in the hidden layers had a significant impact on the performance. Hence optimization of network parameters is essential to realize the benefits of ANN. The shortcomings of empirical kinetic models viz. Pseudo First Order model (PFO) and Pseudo Second Order model (PSO) were also demonstrated. This work demonstrates the utility of ANNS in rigorous multicomponent adsorption kinetics applications and has considerable potential in real time optimization and operation of wastewater treatment plants.

Fast Neural Network Research Articles

Articles published on Fast Neural Network

ICELIA: A Full-Stack Framework for STT-MRAM-Based Deep Learning Acceleration

Localization of common carotid artery transverse section in B-mode ultrasound images using faster RCNN: a deep learning approach.

Fast and resource-efficient Deep Neural Network on FPGA for the Phase-II Level-0 muon barrel trigger of the ATLAS experiment

Sensitivity and Specificity of Non-Invasive Blood Glucose Level Measurement Optical Device to Detect Hypoglycaemia.

An ELM-Embedded Deep Learning Based Intelligent Recognition System for Computer Numeric Control Machine Tools

Segmented analysis of time-of-flight diffraction ultrasound for flaw detection in welded steel plates using extreme learning machines

Neural network models for actual duration of Greek highway projects

Incremental Wishart Broad Learning System for Fast PolSAR Image Classification

Real-time discrimination of photon pairs using machine learning at the LHC

Fast Convolutional Neural Networks in Low Density FPGAs Using Zero-Skipping and Weight Pruning

Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs

Application of the artificial neural network in the forecasting of the airborne contaminant

Swift, versatile and a rigorous kinetic model based artificial neural network surrogate for single and multicomponent batch adsorption processes

DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data

Super-resolution reconstruction of remote sensing images based on convolutional neural network

Fast domain-aware neural network emulation of a planetary boundary layer parameterization in a numerical weather forecast model

A Real-Time Super-Resolution Method Based on Convolutional Neural Networks

Super-Resolution Reconstruction of Deep Residual Network with Multi-Level Skip Connections

심층 신경망의 영상 인식 분류 성능 균일성 향상을 위한 명시적 상호보완 앙상블 구조

A fast neural network approach for direct covariant forces prediction in complex multi-element extended systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Fast Neural Network Research Articles

Articles published on Fast Neural Network

ICELIA: A Full-Stack Framework for STT-MRAM-Based Deep Learning Acceleration

Localization of common carotid artery transverse section in B-mode ultrasound images using faster RCNN: a deep learning approach.

Fast and resource-efficient Deep Neural Network on FPGA for the Phase-II Level-0 muon barrel trigger of the ATLAS experiment

Sensitivity and Specificity of Non-Invasive Blood Glucose Level Measurement Optical Device to Detect Hypoglycaemia.

An ELM-Embedded Deep Learning Based Intelligent Recognition System for Computer Numeric Control Machine Tools

Segmented analysis of time-of-flight diffraction ultrasound for flaw detection in welded steel plates using extreme learning machines

Neural network models for actual duration of Greek highway projects

Incremental Wishart Broad Learning System for Fast PolSAR Image Classification

Real-time discrimination of photon pairs using machine learning at the LHC

Fast Convolutional Neural Networks in Low Density FPGAs Using Zero-Skipping and Weight Pruning

Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs

Application of the artificial neural network in the forecasting of the airborne contaminant

Swift, versatile and a rigorous kinetic model based artificial neural network surrogate for single and multicomponent batch adsorption processes

DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data

Super-resolution reconstruction of remote sensing images based on convolutional neural network

Fast domain-aware neural network emulation of a planetary boundary layer parameterization in a numerical weather forecast model

A Real-Time Super-Resolution Method Based on Convolutional Neural Networks

Super-Resolution Reconstruction of Deep Residual Network with Multi-Level Skip Connections

심층 신경망의 영상 인식 분류 성능 균일성 향상을 위한 명시적 상호보완 앙상블 구조

A fast neural network approach for direct covariant forces prediction in complex multi-element extended systems