This study aims to address the challenges of apple recognition in complex environments by proposing a solution based on the FC-DETR model. Apple cultivation is an important part of agriculture in Xinjiang, but the increasing shortage of labor has driven the demand for automated harvesting technologies. Therefore, research and development of apple recognition technology in complex environments have become crucial. While CNN architecture models can accurately identify apples in conventional settings, their inherent receptive field limitations prevent them from fully capturing global features in noisy and complex environments, making it difficult to achieve the accuracy and robustness required for practical applications. To solve this problem, this study proposes a real-time object detection model, FC-DETR. This model incorporates the innovative FEMA-BasicBlock residual module, the CAFM cross-scale adaptive feature fusion module, and the novel Inner-WIoU loss function, thereby enhancing feature processing, multi-scale feature selection and integration, and detection accuracy. Experimental results show that the FC-DETR model achieves an apple recognition accuracy of 87 % and a recall rate of 82 % in complex backgrounds while maintaining a lightweight design. This study not only makes significant advances in automated apple harvesting technology but also contributes to improving the efficiency and sustainability of the apple industry in Xinjiang and globally.