This paper proposes a 3D vehicle-detection algorithm based on multimodal feature fusion to address the problem of low vehicle-detection accuracy in unmanned system environment awareness. The algorithm matches the coordinate relationships between the two sensors and reduces sampling errors by combining the millimeter-wave radar and camera calibration. Statistical filtering is used to remove redundant points from the millimeter-wave radar data to reduce outlier interference; a multimodal feature fusion module is constructed to fuse the point cloud and image information using pixel-by-pixel averaging. Moreover, feature pyramids are added to extract fused high-level feature information, which is used to improve detection accuracy in complex road scenarios. A feature fusion region proposal structure was established to generate region proposals based on the high-level feature information. The vehicle detection results were obtained by matching the detection frames in their vertices after removal of the redundant detection frames using non-maximum suppression. Experimental results from the KITTI dataset show that the proposed method improved the efficiency and accuracy of vehicle detection with the corresponding average of 0.14 s and 84.71%.