Haptic rendering of surface textures enhances user immersion of human–computer interaction. However, strict input conditions and measurement methods limit the diversity of rendering algorithms. In this regard, we propose a neural network-based approach for vibrotactile haptic rendering of surface textures under unconstrained acquisition conditions. The method first encodes the interactions based on human perception characteristics, and then utilizes an autoregressive-based model to learn a non-linear mapping between the encoded data and haptic features. The interactions consist of normal forces and sliding velocities, while the haptic features are time-frequency amplitude spectrograms by short-time Fourier transform of the accelerations corresponding to the interactions. Finally, a generative adversarial network is employed to convert the generated time-frequency amplitude spectrograms into the accelerations. The effectiveness of the proposed approach is confirmed through numerical calculations and subjective experiences. This approach enables the rendering of a wide range of vibrotactile data for surface textures under unconstrained acquisition conditions, achieving a high level of haptic realism.