Abstract

Convolutional Neural Networks (CNNs) were created for image classification tasks. Quickly, they were applied to other domains, including Natural Language Processing (NLP). Nowadays, the solutions based on artificial intelligence appear on mobile devices and in embedded systems, which places constraints on, among others, the memory and power consumption. Due to CNNs memory and computing requirements, to map them to hardware they need to be compressed.This paper presents the results of compression of the efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to FPGA and the results of this implementation are described. The conducted simulations showed that 5-bit width is enough to ensure no drop in accuracy when compared to the floating point version of the network. Additionally, the memory footprint was significantly reduced (between 85% and 93% comparing to the original model).

Highlights

  • Natural language processing (NLP) is considered to be one of three main application domains of deep learning

  • This paper presents the results of the compression of efficient Convolutional neural networks (CNNs) for sentiment analysis

  • This work focuses on an intersection of CNN, natural language processing (NLP), sentiment analysis, and neural model compression, which may be considered as a subfield of the emerging embedded machine-learning domain

Read more

Summary

Introduction

Natural language processing (NLP) is considered to be one of three main application domains of deep learning (along with image and video processing). There is an increasing need to utilize it on mobile devices for applications such as translation, voice-typing, or image-to-text converters. It is worth noting that, despite an abundance of research efforts in deep learning architecture compression for image processing [2, 6, 19, 20], there are just a few projects intended for NLP neural architecture compression [1, 5, 10, 15]. After pruning and quantization were applied, the model was deployed to the FPGA platform. This has allowed for examining the feasibility and efficiency of using FPGAs in a domain of embedded neural computations. The quantization and pruning impact on the accuracy of the analyzed neural architecture [9] was examined separately for each layer

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.