This study focuses on an artificial neural network (ANN) model for classifying pavement types using acoustic and image data. While conventional studies often use road-surface images for pavement classification, they face challenges with image quality degradation owing to external factors, such as sunlight angle, shadows, and lighting. Therefore, in this study, tire-pavement noise, which has different noise characteristics depending on the material and surface treatment, is used independently and in conjunction with image data for ANN training. To construct the training dataset, tire-pavement noise, and road-surface images are collected from 11 highway sampling sites in South Korea. Two simultaneous measurements are used: the tire-pavement noise is collected using the On-board sound intensity (OBSI) method, and the camera captures the road-surface images. 1/3 octave SIL, spectrum, MFCC, GLCM, and HOG are extracted from the raw data, and the ANN models are trained by these features. Using the spectrum as an input feature for the ANN yields a classification accuracy of 95.18%. However, the total number of parameters in the ANN is double that of the other models. To reduce the ANN size, 1/3 octave band SIL is used for training, and the model size is halved. However, the accuracy decreases by 13.47 percentage points. To overcome this significant decrease, the 1/3 octave bands SIL and image features were used to train ANN, simultaneously. This approach increases the accuracy by 93.85%. By training the ANN using MFCC, which is commonly used as an acoustic feature in other machine learning studies, the highest classification accuracy of 96.84% is achieved. Additionally, MFCC models are affected by the number of coefficients and the signal length. To include the dominant frequency of tire-pavement noise, more than 13 coefficients are used, a number generally known to be suitable for speech recognition. Increasing the number of coefficients from 13 to 40 improves accuracy by 1.17 percentage points. The interval for slicing raw WAV files is reduced to increase the training data and classify the pavement using shorter signals without statistically significant accuracy loss. Although accuracy does not decrease until the signal lengths reach 0.5 seconds, it rapidly decreases when the signal lengths become shorter than 0.4 seconds.