The vision-based localization of orchard mobile robots is an indispensable component of orchard intelligent management, applicable to scenarios such as navigation and harvesting. Due to the rugged terrain and lack of texture features in orchard environments, the accuracy of Visual-Inertial Odometry (VIO) based on feature point is compromised, making it challenging to meet practical application requirements. Addressing these issues, this paper proposes an algorithm for stereo visual-inertial localization based on point-line features, by enhancing VINS-Fusion. The algorithm incorporates the line feature detector algorithm (LSD) into the odometry front-end to detect line features, addressing the insufficient effective feature points and low-quality feature matching issues in orchard environments. Simultaneously, strategies for length filtering and feature matching are introduced for line features, eliminating poor-quality line features and matches caused by rugged terrain, thereby reducing adverse impacts on localization accuracy. To prevent keyframe redundancy due to a scarcity of stable feature points in areas with missing textures, such as tree trunks and leaves, a keyframe criteria optimization method based on point-line features is designed. Back-end optimization minimizes the cost function containing residuals of point-line features, prior information, and IMU, ultimately improving the localization accuracy of mobile robots. To validate the effectiveness and practicality of the proposed algorithm, a self-built dataset was created in real orchard environments, including scenes from banana plantation, mango orchard, and wampee orchard. Localization experiments were conducted using Rosario, EuRoC public datasets and self-built orchard dataset. Results indicate that the proposed algorithm consistently achieves lower ATE-RMSE than VINS-Fusion, with a maximum reduction of 79.2 % in the Rosario dataset sequence, 23.3 % in the EuRoC dataset sequence, and ATE-RMSE values of 0.091 m, 0.107 m, and 0.093 m in banana plantation, mango orchard, and wampee orchard, respectively. These represent reductions of 22.9 %, 7.6 %, and 27.3 % compared to VINS-Fusion. In the self-built orchard dataset, the average frame processing times for the algorithm front-end is 56.3 ms, while for the back-end is 31 ms, meeting the practical localization requirements. The proposed algorithm effectively enhances the localization accuracy of orchard mobile robots, offering a viable solution for autonomous navigation in orchard robot systems.