Automated feature extraction in deep learning models: A boon or a bane?

D Jude Hemanth

doi:10.23919/eecsi53397.2021.9624287

Abstract

Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings. Machine Learning (ML) models hold a significant place in the ever-emerging field of Artificial Intelligence (AI). There are several ML approaches whose performance varies with respect to the nature of the data and the type of the application. Till the early 2000's, Artificial Neural Networks (ANN) were considered to be the prime ML model which is advantageous in an omnidirectional way. Suddenly, somewhere in the mid 2000's, a new variation called as “Deep Learning” gained significant attention which slowly started side-lining the conventional ANN. It is interesting to note that the entire world started moving towards DL. What is the difference between conventional ANN and DL models? The answer from everyone will be the same: increase in performance measures. But, how and why? You will hardly receive an answer for the second question. This work tries to give an answer for this question. There are two prime differences: (1) Hand-crafted features on which ANN is dependent on is eliminated in DL models and (2) The number of layers in the architecture of DL models is significantly high. Feature extraction is the process which is normally used to extract the significant features of any data. However, when a human user doesn't have an expertise on the data, the hand-crafted features may be irrelevant which leads to the inferior performance of ANN. Hence, blaming the conventional ANN for weak performance may not be completely correct. Instead, what if we give the complete responsibility of feature extraction to the algorithm/model itself? This is the main reason for the increased reason of the DL models. How does the DL models extract features on its own? As a case study, Convolutional Neural Networks (CNN) depend on huge number of convolutional layers for feature extraction. The initial layers extract the low-level features and the higher order layers extract the high-level features. Thus, one must pay the price for the increased performance measure in the form of huge number of layers which leads to computational complexity. This situation leads to a confused mind whether the performance measure is important or the computational complexity. On the other hand, it throws open plenty of opportunities for research in the area of DL to solve the complexity problem without sacrificing the performance measures.

Full Text