Objective: Scaphoid fractures, particularly occult and non-displaced fractures, are difficult to detect using traditional X-ray methods because of their subtle appearance and variability in bone density. This study proposes a two-stage CNN approach to detect and classify scaphoid fractures using anterior–posterior (AP) and lateral (LA) X-ray views for more accurate diagnosis. Methods: This study emphasizes the use of multi-view X-ray images (AP and LA views) to improve fracture detection and classification. The multi-view fusion module helps integrate information from both views to enhance detection accuracy, particularly for occult fractures that may not be visible in a single view. The proposed method includes two stages, which are stage 1: detect the scaphoid bone using Faster RCNN and a Feature Pyramid Network (FPN) for region proposal and small object detection. The detection accuracy for scaphoid localization is 100%, with Intersection over Union (IoU) scores of 0.8662 for AP views and 0.8478 for LA views. And stage 2: perform fracture classification using a ResNet backbone and FPN combined with a multi-view fusion module to combine features from both AP and LA views. This stage achieves a classification accuracy of 89.94%, recall of 87.33%, and precision of 90.36%. Results: The proposed model performs well in both scaphoid bone detection and fracture classification. The multi-view fusion approach significantly improves recall and accuracy in detecting fractures compared to single-view approaches. In scaphoid detection, both AP and LA views achieved 100% detection accuracy. In fracture detection, using multi-view fusion, the accuracy for AP views reached 87.16%, and for LA views, it reached 83.83%. Conclusions: The multi-view fusion model effectively improves the detection of scaphoid fractures, particularly in cases of occult and non-displaced fractures. The model provides a reliable, automated approach to assist clinicians in detecting and diagnosing scaphoid fractures more efficiently.