Abstract

Small and massively imbalanced datasets are long-standing problems on medical image classification. Traditionally, researchers use pre-trained models to solve these problems, however, pre-trained models typically have a huge number of trainable parameters. Small datasets are challenging for them to train a model adequately and imbalanced datasets easily lead to overfitting on the classes with more samples. Multiple-stream networks that learn a variety of features have recently gained popularity. Therefore, in this work, a quad-stream hybrid model called QuadSNet using conventional as well as separable convolutional neural networks is proposed to achieve better performance on small and imbalanced datasets without using any pre-trained model. The designed model extracts hybrid features and the fusion of such features makes the model more robust on heterogeneous data. Besides, a weighted margin loss is used to handle the problem of class imbalance. The QuadSNet is trained and tested on seven different classification datasets. To evaluate the advantages of QuadSNet on small and massively imbalanced data, it is compared with six state-of-the-art pre-trained models on three benchmark datasets based on Pneumonia, COVID-19, and Cancer classification. To assess the performance of QuadSNet on general classification datasets, it is compareed with the best model on each of the remaining four datasets, which contain larger, balanced, grayscale, color or non-medical image data. The results show that QuadSNet handles the class imbalance and overfitting better than existing pre-trained models with much fewer parameters on small datasets. Meanwhile, QuadSNet has competitive performance in general datasets.

Highlights

  • A huge amount of data is needed to train the neural networks for natural and medical image classification

  • The algorithm level approaches have evolved since the reemergence of deep learning

  • Transfer learning [5], few-shot learning [18], zero-shot learning [6], Siamese networks [8], network ensembles [4] and most recent algorithms based on generative adversarial networks [2] have been applied to small and imbalanced datasets

Read more

Summary

Introduction

A huge amount of data is needed to train the neural networks for natural and medical image classification. Along with the scarcity of sufficient samples, generally, the medical image datasets are massively imbalanced and they possess very limited positive cases. Transfer learning [5], few-shot learning [18], zero-shot learning [6], Siamese networks [8], network ensembles [4] and most recent algorithms based on generative adversarial networks [2] have been applied to small and imbalanced datasets. The algorithm level approaches commonly rely on pre-trained models. Despite decent signs of progress, the pre-trained models orthodoxly use millions of parameters and complex architecture to achieve competitive results, which appear to be enormous for a small dataset to train a good model

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call