Identifying and statistically analyzing soybean pod types are crucial for seed evaluation and yield estimation. Traditional visual assessment by breeding personnel is time-consuming, labor-intensive, and prone to subjective bias, especially with large datasets. Automatic assessment methods usually struggle with the highly confusing pod types with two and three seeds, affecting the model’s identification accuracy. To address these issues, we propose to improve the standard YOLOv5s object detection model to enhance the differentiation between pod types and to boost the model’s efficiency in prediction. To reduce the number of parameters and the computational load, we propose to introduce the FasterNet Block module in the FasterNet model into the original C3 module, leading to improvements in both detection accuracy and speed. To strengthen the feature extraction and representation for specific targets, the Efficient Multi-Scale Attention (EMA) module is incorporated into the C3 module of the backbone network, improving the identification of similar pod types. The Inner-IoU is combined with the CIoU as the loss function to further enhance detection accuracy and generalization. Experiments comparing FEI-YOLO with the baseline YOLOv5s show that FEI-YOLO achieves an mAP@0.5 of 98.6% and an mAP@0.5:0.95 of 81.1%, with improvements of 1.5% and 1.4%, respectively. Meanwhile, the number of parameters is reduced by 13.2%, and FLOPs decreased by 10.8%, demonstrating the model’s effectiveness and efficiency, enabling rapid and accurate identification of soybean pod types from images.