Vision-based apple quality grading with multi-view spatial network

Xiao Shi,Tan Sun,Xue Xia,Xiujuan Chai,Chenxue Yang

doi:10.1016/j.compag.2022.106793

Abstract

• Describing overall features by capturing apple images from multiple perspectives. • Lightweight CNN helps model more practical. • Aggregating spatial apple features is realized by bilateral long short-term memory. • The performance and efficiency are far excellent with a 99.23% accuracy. Apple grading is one of the typical tasks in food grading. For the image-based apple grading task, the variety of apple shapes and the incompleteness of single-view visual information pose great difficulties in achieving high accuracy of apple grading. This paper proposes a novel multi-view spatial network to tackle the apple grading task, which incorporates apple size information as one of the grading criteria, to solve those problems that affect the accuracy of grading. Specifically, we first use well-pretrained lightweight CNNs to extract low-level features of each view of apple. Second, we formulate a spatial feature aggregation module through a bidirectional LSTM and mean-pooling to exploit the correlative information of apple from multiple views. Besides, we elaborately design a multi-view apple data collection scheme for apple grading dataset construction. The experimental results indicate that the accuracy of Multi-View Spatial Network reaches 99.23%, which is quite promising in apple grading task and outperforms some state-of-the-art grading methods.

Full Text