Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

Elife Ozturk Kiyak,Derya Birant,Ayse Betul Cengiz,Kokten Ulas Birant

doi:10.1007/s42979-020-00281-1

Abstract

Source code classification (SCC) is a task to assign codes into different categories according to a criterion such as according to their functionalities, programming languages or vulnerabilities. Many source code archives are organized according to the programming languages, and thereby, the desired code fragments can be easily accessed by searching within the archive. However, manually organizing source code archives by field experts is labor intensive and impractical because of the fast-growing available source codes. Therefore, this study proposes new convolutional neural network (CNN) architectures to build source code classifiers that automatically identify programming languages from source codes. This is the first study in which the performances of deep learning algorithms on programming language identification are compared on both image and text files. In this study, the experiments are performed on three source code datasets to identify eight programming languages, including C, C++, C# , Go, Python, Ruby, Rust, and Java. The comparative results indicate that although text-based SCC and image-based SCC approaches achieve very high ( $$> 93.5\%$$ ) and similar accuracies, text-based classification has significantly better performance in terms of execution time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

Abstract

Talk to us

Similar Papers

More From: SN Computer Science

Lead the way for us

Journal: SN Computer Science	Publication Date: Aug 14, 2020
Citations: 6

Similar Papers

Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies
Myura Nagendran ... Mahiben Maruthappu
BMJ | VOL. 368
Myura Nagendran, et. al.Myura Nagendran ... Mahiben Maruthappu
25 Mar 2020
BMJ | VOL. 368

Source Code Assessment and Classification Based on Estimated Error Probability Using Attentive LSTM Language Model and Its Application in Programming Education
Md Mostafizer Rahman ... Yutaka Watanobe
Applied Sciences | VOL. 10
Md Mostafizer Rahman, et. al.Md Mostafizer Rahman ... Yutaka Watanobe
24 Apr 2020
Applied Sciences | VOL. 10

Engineering Paper] SCC: Automatic Classification of Code Snippets
Kamel Alreshedy ... Dhanush Dharmaretnam
-
Kamel Alreshedy, et. al.Kamel Alreshedy ... Dhanush Dharmaretnam
01 Sep 2018
01 Sep 2018

A Catalog of Source Code Metrics – A Tertiary Study
Umar Iftikhar ... Nauman Bin Ali
-
Umar Iftikhar, et. al.Umar Iftikhar ... Nauman Bin Ali
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

Abstract

Talk to us

Similar Papers

More From: SN Computer Science