The prior monitoring of the road surface conditions provides valuable information to vehicle trajectory planning and active control systems. Road surface perception with vision sensors is an emerging technology that has recently gained much attention. However, there are few reports on an exhaustive technical scheme and application practice for driving assistance. This study first creates a large-scale road surface image dataset containing one million samples with detailed road friction level, material, and unevenness level annotations. A convolutional neural network (CNN) classification model constrained by a combined loss function and adapted optimization strategies is trained on the dataset. Then we propose a decision-level fusion method based on the improved Dempster-Shafer evidence theory to enhance the robustness of the classification algorithm. Finally, the developed models are deployed on the embedded hardware platform. The top-1 accuracy for classifying road surface images reaches 92.05% by the CNN model and 97.50% after fusion. Onboard experiments illustrate the advantage and superiority of the developed technical framework, which has excellent potential in real-vehicle applications for driving assistance.