Accurate segmentation of the temporomandibular joint (TMJ) from cone beam CT (CBCT) images holds significant clinical value for diagnosing temporomandibular joint osteoarthrosis (TMJOA) and related conditions. Convolutional neural network-based medical image segmentation methods have achieved state-of-the-art performance in various segmentation tasks. However, 3D medical images segmentation requires substantial global context and rich spatial semantic information, demanding much more GPU memory and computational resources. To address these challenges in 3D medical image segmentation, we propose a novel network- the MVEL-Net (Multi-view Ensemble Learning Network) for TMJ CBCT image segmentation. By resampling images along three dimensions, we generate multiple weak learners with different spatial semantic information. A subsequent strong learning network effectively integrates the outputs from these weak learners to achieve more accurate segmentation results. We evaluated our network model using a clinical dataset comprising 88 subjects with TMJ CBCT images. The average Dice similarity coefficient (DSC) was 0.9817 ± 0.0049, the average surface distance was 0.0540 ± 0.0179mm, and the 95% Hausdorff distance was 0.1743 ± 0.0550mm. Our proposed MVEL-Net demonstrates excellent segmentation performance on TMJ from CBCT images, while using fewer GPU memory resources compared to other 3D networks. The effectiveness of this method in capturing spatial context could be leveraged for tasks like organ segmentation from volumetric scans. This may facilitate wider adoption of AI-based solutions for automated analysis of 3D medical images.