Abstract

As a key technology for tobacco leaf harvesting, automatic identification of the field maturity of tobacco leaves is an important issue in the tobacco industry. For years, researchers have been exploring machine learning methods to perform the identification task, with traditional supervised learning (SL) schemes being studied in depth. However, the actual use of an SL algorithm is always costly and cumbersome, as it requires a large amount of labor and time to collect and label enough tobacco leaf samples. Therefore, a cost-effective algorithm solution is desired by the industry. To this end, we investigate the efficacy and efficiency of semi-supervised learning (SSL) scheme for this task, attempting to significantly reduce the labeling cost and provide a flexible way to control the cost. Considering the advantages of smartphones in terms of availability and economy, we use smartphone photography as an image acquisition tool, thereby building a tobacco leaf maturity dataset containing more than 7000 tobacco leaf images of three field maturity categories (unripe, ripe and overripe) of three stem positions (bottom, middle and top). According to the fact that the maturity of the unripe, ripe and overripe leaves increases sequentially, we propose a maturity structure constraint (MSC) to make the identification model converge to a confident space and be generalizable. Furthermore, we propose a sample selection method based on self-supervised representation learning, which can effectively select high-quality samples for the identification network. Finally, we design an SSL algorithm framework called maturity structure constraint based semi-supervised learning (MSC-SSL) for the identification task. Experimental results show that our SSL framework is capable to significantly reduce the sample usage and improve the identification accuracy. It achieves the accuracies of 91.92% (bottom position/with 33.16% training samples), 87.06% (middle position/with 27.57% training samples) and 87.64% (top position/with 28.02% training samples) when auto stopped, as compared to the SSL benchmarks of 87.37% and 90.40%, 83.53% and 85.10%, 81.65% and 84.64%, respectively. Furthermore, our method can match or exceed the accuracy of the SL scheme using only about 25% of the labeled samples required by the SL scheme.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call