Accurate coffee plant counting is a crucial metric for yield estimation and a key component of precision agriculture. While multispectral UAV technology provides more accurate crop growth data, the varying spectral characteristics of coffee plants across different phenological stages complicate automatic plant counting. This study compared the performance of mainstream YOLO models for coffee detection and segmentation, identifying YOLOv9 as the best-performing model, with it achieving high precision in both detection (P = 89.3%, mAP50 = 94.6%) and segmentation performance (P = 88.9%, mAP50 = 94.8%). Furthermore, we studied various spectral combinations from UAV data and found that RGB was most effective during the flowering stage, while RGN (Red, Green, Near-infrared) was more suitable for non-flowering periods. Based on these findings, we proposed an innovative dual-channel non-maximum suppression method (dual-channel NMS), which merges YOLOv9 detection results from both RGB and RGN data, leveraging the strengths of each spectral combination to enhance detection accuracy and achieving a final counting accuracy of 98.4%. This study highlights the importance of integrating UAV multispectral technology with deep learning for coffee detection and offers new insights for the implementation of precision agriculture.