Compression of Convolutional Neural Network for Natural Language Processing

Krzysztof Wróbel,Kazimierz Wiatr,Marcin Pietroń,Michał Karwatowski,Maciej Wielgosz

doi:10.7494/csci.2020.21.1.3375

Krzysztof Wróbel, Kazimierz Wiatr + Show 3 more

Open Access

PDF Available

https://doi.org/10.7494/csci.2020.21.1.3375

Copy DOI

Export

Save

Cite

Journal: Computer Science	Publication Date: Jan 27, 2020
Citations: 10	License type: publisher-specific-oa

Affiliation: Atlantic General Hospital

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Convolutional Neural Networks (CNNs) were created for image classification tasks. Quickly, they were applied to other domains, including Natural Language Processing (NLP). Nowadays, the solutions based on artificial intelligence appear on mobile devices and in embedded systems, which places constraints on, among others, the memory and power consumption. Due to CNNs memory and computing requirements, to map them to hardware they need to be compressed.This paper presents the results of compression of the efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to FPGA and the results of this implementation are described. The conducted simulations showed that 5-bit width is enough to ensure no drop in accuracy when compared to the floating point version of the network. Additionally, the memory footprint was significantly reduced (between 85% and 93% comparing to the original model).

Highlights

Natural language processing (NLP) is considered to be one of three main application domains of deep learning
This paper presents the results of the compression of efficient Convolutional neural networks (CNNs) for sentiment analysis
This work focuses on an intersection of CNN, natural language processing (NLP), sentiment analysis, and neural model compression, which may be considered as a subfield of the emerging embedded machine-learning domain

Summary

Introduction

Natural language processing (NLP) is considered to be one of three main application domains of deep learning (along with image and video processing). There is an increasing need to utilize it on mobile devices for applications such as translation, voice-typing, or image-to-text converters. It is worth noting that, despite an abundance of research efforts in deep learning architecture compression for image processing [2, 6, 19, 20], there are just a few projects intended for NLP neural architecture compression [1, 5, 10, 15]. After pruning and quantization were applied, the model was deployed to the FPGA platform. This has allowed for examining the feasibility and efficiency of using FPGAs in a domain of embedded neural computations. The quantization and pruning impact on the accuracy of the analyzed neural architecture [9] was examined separately for each layer

Methods

Findings

Discussion

Conclusion