Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs fully unrolled and pipelined. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficient algorithm for implementing CMVM operations with distributed arithmetic on FPGAs that simultaneously optimizes for area consumption and latency. The algorithm achieves resource reduction similar to state-of-the-art algorithms while being significantly faster to compute. The proposed algorithm is open sourced and integrated into the hls4ml library, a free and open source library for running real-time neural network inference on FPGAs. We show that the proposed algorithm can reduce on-chip resources by up to a third for realistic, highly quantized neural networks while simultaneously reducing latency, enabling the implementation of previously infeasible networks.

Similar Papers
  • Conference Article
  • Cite Count Icon 47
  • 10.1145/3495243.3560551
Real-time neural network inference on extremely weak devices
  • Oct 14, 2022
  • Kai Huang + 1 more

With the wide adoption of AI applications, there is a pressing need of\nenabling real-time neural network (NN) inference on small embedded devices, but\ndeploying NNs and achieving high performance of NN inference on these small\ndevices is challenging due to their extremely weak capabilities. Although NN\npartitioning and offloading can contribute to such deployment, they are\nincapable of minimizing the local costs at embedded devices. Instead, we\nsuggest to address this challenge via agile NN offloading, which migrates the\nrequired computations in NN offloading from online inference to offline\nlearning. In this paper, we present AgileNN, a new NN offloading technique that\nachieves real-time NN inference on weak embedded devices by leveraging\neXplainable AI techniques, so as to explicitly enforce feature sparsity during\nthe training phase and minimize the online computation and communication costs.\nExperiment results show that AgileNN's inference latency is >6x lower than the\nexisting schemes, ensuring that sensory data on embedded devices can be timely\nconsumed. It also reduces the local device's resource consumption by >8x,\nwithout impairing the inference accuracy.\n

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 17
  • 10.3390/s23187701
Neuromorphic Sentiment Analysis Using Spiking Neural Networks
  • Sep 6, 2023
  • Sensors (Basel, Switzerland)
  • Raghavendra K Chunduri + 1 more

Over the past decade, the artificial neural networks domain has seen a considerable embracement of deep neural networks among many applications. However, deep neural networks are typically computationally complex and consume high power, hindering their applicability for resource-constrained applications, such as self-driving vehicles, drones, and robotics. Spiking neural networks, often employed to bridge the gap between machine learning and neuroscience fields, are considered a promising solution for resource-constrained applications. Since deploying spiking neural networks on traditional von-Newman architectures requires significant processing time and high power, typically, neuromorphic hardware is created to execute spiking neural networks. The objective of neuromorphic devices is to mimic the distinctive functionalities of the human brain in terms of energy efficiency, computational power, and robust learning. Furthermore, natural language processing, a machine learning technique, has been widely utilized to aid machines in comprehending human language. However, natural language processing techniques cannot also be deployed efficiently on traditional computing platforms. In this research work, we strive to enhance the natural language processing traits/abilities by harnessing and integrating the SNNs traits, as well as deploying the integrated solution on neuromorphic hardware, efficiently and effectively. To facilitate this endeavor, we propose a novel, unique, and efficient sentiment analysis model created using a large-scale SNN model on SpiNNaker neuromorphic hardware that responds to user inputs. SpiNNaker neuromorphic hardware typically can simulate large spiking neural networks in real time and consumes low power. We initially create an artificial neural networks model, and then train the model using an Internet Movie Database (IMDB) dataset. Next, the pre-trained artificial neural networks model is converted into our proposed spiking neural networks model, called a spiking sentiment analysis (SSA) model. Our SSA model using SpiNNaker, called SSA-SpiNNaker, is created in such a way to respond to user inputs with a positive or negative response. Our proposed SSA-SpiNNaker model achieves 100% accuracy and only consumes 3970 Joules of energy, while processing around 10,000 words and predicting a positive/negative review. Our experimental results and analysis demonstrate that by leveraging the parallel and distributed capabilities of SpiNNaker, our proposed SSA-SpiNNaker model achieves better performance compared to artificial neural networks models. Our investigation into existing works revealed that no similar models exist in the published literature, demonstrating the uniqueness of our proposed model. Our proposed work would offer a synergy between SNNs and NLP within the neuromorphic computing domain, in order to address many challenges in this domain, including computational complexity and power consumption. Our proposed model would not only enhance the capabilities of sentiment analysis but also contribute to the advancement of brain-inspired computing. Our proposed model could be utilized in other resource-constrained and low-power applications, such as robotics, autonomous, and smart systems.

  • Book Chapter
  • Cite Count Icon 5
  • 10.1007/978-3-030-99108-1_34
Real-Time Image Analysis with Neural Networks on Industrial Controllers for Individualized Production
  • Jan 1, 2022
  • Christoph Wree + 4 more

Manufacturing systems for individualized production require workflows depending on individual objects. Machine learning (ML) offers the possibility to classify different objects by training a neural network. Depending on the output values of the network, decisions for the following production step can then be controlled. The question arises whether it is possible to execute the neural network in real time in coordination with the machine and motion control tasks. In this paper, this question is investigated using a programmable logic controller (PLC) runtime environment on a standard industrial PC. The execution times of different neural network implementation methods are measured and compared. The fastest neural network requires an average execution time of only 54 µs. The characteristics of the different methods with respect to training and implementing the neural networks in the controller are also discussed.KeywordsMachine learningIntelligent Manufacturing SystemsIndividualized production

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/csit56902.2022.10000558
A chatbot of a person’s emotional state using a neural network
  • Nov 10, 2022
  • Maherus Anastasiia + 1 more

Implementation of the project in the form of a chatbot of a person’s emotional state using a neural network. The work describes in detail the process of implementing a chatbot for human mental health using a neural network. Using this application allows the user to improve his emotional state in real time. With the help of surveys in the form of questions about the state of his emotional health, the application analyzes the user’s behavior for the presence of psychological ones deviations Thanks to this, you can track emotional disorders. This interaction takes place in the format of a dialogue between human and artificial intelligence. The central goal of the development is to enable people suffering from emotional disorders to improve their mental health in real time. A methodology for determining a person’s emotional state using a neural network in real time has been developed. A chatbot goes through several steps in the process of recognizing an emotional state. The first is cleaning the data for further suitable analysis. Next is the process of identifying the emotional color in each word and calculating the rating of each emotion. The output is presented in the form of a dictionary.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/computers7030043
An Analytical Comparison of Locally-Connected Reconfigurable Neural Network Architectures Using a C. elegans Locomotive Model
  • Aug 15, 2018
  • Computers
  • Jonathan Graham-Harper-Cater + 2 more

The scale of modern neural networks is growing rapidly, with direct hardware implementations providing significant speed and energy improvements over their software counterparts. However, these hardware implementations frequently assume global connectivity between neurons and thus suffer from communication bottlenecks. Such issues are not found in biological neural networks. It should therefore be possible to develop new architectures to reduce the dependence on global communications by considering the connectivity of biological networks. This paper introduces two reconfigurable locally-connected architectures for implementing biologically inspired neural networks in real time. Both proposed architectures are validated using the segmented locomotive model of the C. elegans, performing a demonstration of forwards, backwards serpentine motion and coiling behaviours. Local connectivity is discovered to offer up to a 17.5× speed improvement over hybrid systems that use combinations of local and global infrastructure. Furthermore, the concept of locality of connections is considered in more detail, highlighting the importance of dimensionality when designing neuromorphic architectures. Convolutional Neural Networks are shown to map poorly to locally connected architectures despite their apparent local structure, and both the locality and dimensionality of new neural processing systems is demonstrated as a critical component for matching the function and efficiency seen in biological networks.

  • Research Article
  • Cite Count Icon 14
  • 10.3901/cjme.2009.02.282
Application of Smith Predictor Based on Single Neural Network in Cold Rolling Shape Control
  • Jan 1, 2009
  • Chinese Journal of Mechanical Engineering
  • Yiqun Wang

Flatness is one of the most important criterion factors to evaluate the quality of the steel strip. To improve the strip’s flatness quality, the most frequently used methodology is to employ the closed-loop automatic shape control system. However, in the shape control system, the shape-meter is always installed at the down way of the exit of the cold rolling mill and can not sense the changes of the strip flatness in the rolling gap directly. This kind of installation results in the delay of the feedback in the control system. Therefore, the stability and response performance of the system are strongly affected by the delay. At present, there is still no mature way to design controllers for systems with time delay. Although the conventional PID controller used in most practical applications has the capability to compensate the delay, the effect of the compensation is limited, especially for the systems with long time delay. Smith predictor, as a compensator for solving this problem, is now widely used in industry systems. However, the request of highly precise model of the system and the poor adaptive performance to the changes of related parameters limit the application of the Smith predictor in practice. In order to overcome the drawbacks of the Smith predictor, a new Smith predictor based on single neural network PID (SNN-PID) is proposed. Because the single neural network is employed into the Smith predictor to improve the controller’s self-adaptability, the adaptive capability to the varying parameters of the system is improved. Meanwhile, for the purpose of solving the problems such as time-consuming and complicated calculation of the neural networks in real time, the learning coefficient of neural network is divided into several stages as usually done in expert control system. Therefore, the control system can obtain fast response due to the improved calculation speed of the neural networks. In order to validate the performance of the proposed controller, the experiment is conducted on the shape control system in a 300 mm four-high reversing cold rolling mill. The experimental results show that the SNN-PID with Smith predictor controller can effectively compensate the delay effects and achieve better control performance than the conventional PID controller.

  • PDF Download Icon
  • Research Article
  • 10.21303/2585-6847.2017.00469
DEVELOPMENT OF THE METHOD OF UNSUPERVISED TRAINING OF CONVOLUTIONAL NEURAL NETWORKS BASED ON NEURAL GAS MODIFICATION
  • Nov 23, 2017
  • Technology transfer: fundamental principles and innovative technical solutions
  • Viacheslav Moskalenko

Technologies for computer analysis of visual information based on convolutional neural networks have been widely used, but there is still a shortage of working algorithms for continuous unsupervised training and re-training of neural networks in real time, limiting the effectiveness of their functioning under conditions of nonstationarity and a priori uncertainty. In addition, the back propagation method for learning multi-layer neural networks requires significant computational resources and the amount of marked learning data, which makes it difficult to implement them in autonomous systems with limited resources. One approach to reducing the computational complexity of deep machine learning and overfitting is use of the neural gas principles to implement learning in the process of direct information propagation and sparse coding to increase the compactness and informativeness of feature representation. The paper considers the use of sparse coding neural gas for learning ten layers of the VGG-16 neural network on selective data from the ImageNet database. At the same time, it is suggested that the evaluation of the effectiveness of the feature extractor learning be carried out according to the results of so-called information-extreme machine learning with the teacher of the output classifier. Information-extreme learning is based on the principles of population optimization methods for binary coding of observations and the construction of radial-basic decision rules optimal in the information criterion in the binary Hamming space. According to the results of physical modeling, it is shown that learning without a teacher ensures the accuracy of decision rules to 96.4 %, which is inferior to the accuracy of learning with the teacher, which is equal to 98.7 %. However, the absence of an error in the training algorithm for the backward propagation of the error causes the prospect of further research towards the development of meta-optimization algorithms to refine the feature extractor's filters and parameters of the unsupervised training algorithm

  • Conference Article
  • 10.1109/icsigp.1996.571138
Shortest paths computation of graph by neural networks in real time
  • Oct 14, 1996
  • Zhu Daming + 1 more

A new kind of neural networks on solving shortest paths problem is presented, and the stability of the neural networks is proved. For a directed or undirected graph, the proposed neural network is always convergent to its single equilibrium, so that the shortest paths between every pair of nodes of the oriented graph are produced. This paper makes a new study on solving the optimization but not NP-hard problems by neural networks.

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.asoc.2014.02.015
Dynamic classification of ballistic missiles using neural networks and hidden Markov models
  • Mar 11, 2014
  • Applied Soft Computing
  • Upendra Kumar Singh + 2 more

Dynamic classification of ballistic missiles using neural networks and hidden Markov models

  • Research Article
  • Cite Count Icon 5
  • 10.1002/aisy.202200105
An Adaptive Intelligent System Based on Energy‐Efficient Synaptic Resistor Circuits with Fast Real‐Time Learning
  • Aug 5, 2022
  • Advanced Intelligent Systems
  • Rahul Shenoy + 8 more

Unlike the human brain, which concurrently executes inference and learning algorithms in neural networks in real time, artificial intelligence (AI) systems usually execute inference algorithms and learning algorithms in series, which lack fast real‐time learning functionality, high computing energy efficiency, and adaptability in the complex, erratic real world. Herein, an intelligent system integrating a drone and a synaptic resistor (synstor) circuit that concurrently executes inference and reinforcement learning algorithms in real‐time is reported. Without any prior learning or programming, the conductance matrix of the synstor circuit is dynamically optimized in its real‐time learning processes, thus enabling the drone to adapt and fly toward its target positions in erratic aerodynamic environments. In learning experiments involving a drone driven by synstor circuits, humans, or computers, the real‐time learning by the synstor circuit is superior to the real‐time learning by humans and the cloud learning by computers, in terms of key benchmarks including adaptability, learning time, precision, power consumption, and energy efficiency. By circumventing the fundamental limitations in computers, synstor circuits open up new directions to establish AI systems with brain‐like fast real‐time learning functionality, high computing energy efficiency, and adaptability in complex, erratic real‐world environments for versatile applications.

  • Conference Article
  • 10.1109/sami.2008.4469183
Reduction of Visual Information in Neural Network Learning Process Visualization
  • Jan 1, 2008
  • Matus Uzak + 3 more

Visualization of the learning of neural network faces the problem of dealing with overwhelming amount of visual information. This paper describes the application of clustering methods for reduction of visual information in the response function visualization. When only clusters of neurons are visualized, instead of direct visualization of responses of all neurons in the network, the amount of visually presented information can be significantly reduced. This is useful for reducing user fatigue and also for minimizing the visualization equipment requirements. We show, that application of Kohonen network or growing neural gas with utility factor algorithm allows to visualize the learning of moderate-sized neural networks in real time. Comparison of both algorithms in this task is provided, also with performance analysis and example results of response function visualization.

  • Research Article
  • Cite Count Icon 1
  • 10.31498/2225-6733.41.2020.226118
Face recognition using a neural network
  • Dec 24, 2020
  • Вісник Приазовського Державного Технічного Університету. Серія: Технічні науки
  • O.I Pronina + 2 more

Modern trends in security and development of information technologies push forward all spheres of human life. The task of isolating a human face in a natural or artificial setting and subsequent identification has always been among the highest priority tasks for researchers working in the field of machine vision systems and artificial intelligence. In addition, the task of recognition is very relevant in the field of security – both for storing data and for finding criminals on surveillance cameras, and so on. In addition, all recognition systems use neural networks to improve performance, increase efficiency and facilitate the process itself. However, at present, despite the similarity of tasks and methods used in the development of alternative systems for biometric identification of a person, such as identification by fingerprints or by the image of the iris, the identification systems by the image of the face are significantly inferior to the above systems. Therefore, improving face recognition systems has many improvement paths. In the work, an analysis of literary publications, existing algorithms used in face recognition and human identification was carried out. The main method of face recognition is the use of a convolutional neural network, the selection of objects in the image is carried out using the Viola-Jones method, the AdaBoost machine learning algorithm is used, and the Haar classifier is most often used as a classifier. The article is devoted to the creation of software for face recognition using a convolutional neural network in real time. The software can recognize and identify a person with head tilt, tilt, and under different lighting conditions. In this case, sampling training for the model is carried out on a limited number of photographs. Experimental studies were carried out to test the developed mathematical model and the real-time face recognition algorithm.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icpr48806.2021.9412633
Porting A Convolutional Neural Network For Stereo Vision In Hardware
  • Jan 10, 2021
  • D - Od G Sotiropoulos + 1 more

With the leaps of progress done in the field of machine learning through the last few years, Artificial Neural Networks (ANN) are being used in more and more applications. In the field of computer vision, applications of ANNs include object recognition, motion and object tracking, and obstacle avoidance. Alternatively, ANNs are used to find the solutions of costly problems such as the construction of a depth map for stereoscopic vision. Significant research has been done using Field Programmable Gate Arrays (FPGAs) to accelerate the simulation of ANNs and achieve real-time execution. We seek to develop optimized hardware for embedded systems in order to run pretrained neural networks in real time. In this paper we analyze, reconstruct and reevaluate a pretrained convolutional neural network for stereo matching and develop a hardware architecture to be used in an FPGA so as to compute the stereo estimation of still images in real time in hardware.

  • PDF Download Icon
  • Research Article
  • 10.36930/40300517
Нейронна мережа для розпізнавання та класифікації картографічних зображень ґрунтових масивів
  • Nov 3, 2020
  • Scientific Bulletin of UNFU
  • V V Zhukovskyy + 2 more

Запропоновано нейронну мережу для розпізнавання картографічних зображень ґрунтових масивів та класифікації ландшафтних ділянок за типами ґрунтових масивів із використанням нейронної мережі. Описано підходи до проектування архітектури, методів навчання, підготовки даних для проведення навчання, тренування та тестування нейронної мережі. Розроблено структурно-функціональну схему нейронної мережі, яка складається із вхідного, прихованих та вихідного шарів, кожен окремий нейрон описано відповідною активаційною функцією із підібраними ваговими коефіцієнтами. Показано доцільність застосування кількості нейронів, їх тип та архітектуру для проведення задачі розпізнавання та класифікації ділянок на кадастрових картах. Як вихідні дані використано відкриті державні інформаційні ресурси, в яких виділено окремі ділянки за типами ґрунтів, їх поширення та сформовано базу даних для навчання та тренування нейронної мережі. Проаналізовано ефективність, швидкодію та точність роботи нейронної мережі, зокрема, проведено комп'ютерну симуляцію із використанням сучасного програмного забезпечення та математичне моделювання обчислювальних процесів у середині структури нейронної мережі. Розроблено програмні засоби для попередньої підготовки та оброблення вхідних даних, подальшого тренування та навчання нейронної мережі та безпосередньо процесу розпізнавання та класифікації. Відповідно до отриманих результатів, розроблена модель та структура нейромережі, її програмні засоби реалізації показують високу ефективність як на етапі попереднього оброблення даних, так і загалом на етапі класифікації та виділення цільових ділянок ґрунтових масивів. Надалі наступним етапом досліджень є розроблення та інтеграція програмно-апаратної системи на основі розпаралелених та частково розпаралелених засобів обчислювальної техніки, що дасть змогу значно пришвидшити обчислювальні операції, досягти виконання процесів навчання та тренування нейронної мережі в режимі реального часу та без втрати точності. Подані наукові та практичні результати мають високий потенціал для інтеграції в сучасні інформаційно-аналітичні системи, системи аналізу та моніторингу за станом навколишнього середовища, технологічними об'єктами та об'єктами промисловості.

  • Research Article
  • Cite Count Icon 84
  • 10.1016/j.neuro.2016.03.019
A multi-laboratory evaluation of microelectrode array-based measurements of neural network activity for acute neurotoxicity testing
  • Mar 29, 2016
  • NeuroToxicology
  • Andrea Vassallo + 15 more

A multi-laboratory evaluation of microelectrode array-based measurements of neural network activity for acute neurotoxicity testing

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant