Haptic rendering of surface textures enhances user immersion of human–computer interaction. However, strict input conditions and measurement methods limit the diversity of rendering algorithms. In this regard, we propose a neural network-based approach for vibrotactile haptic rendering of surface textures under unconstrained acquisition conditions. The method first encodes the interactions based on human perception characteristics, and then utilizes an autoregressive-based model to learn a non-linear mapping between the encoded data and haptic features. The interactions consist of normal forces and sliding velocities, while the haptic features are time–frequency amplitude spectrograms by short-time Fourier transform of the accelerations corresponding to the interactions. Finally, a generative adversarial network is employed to convert the generated time–frequency amplitude spectrograms into the accelerations. The effectiveness of the proposed approach is confirmed through numerical calculations and subjective experiences. This approach enables the rendering of a wide range of vibrotactile data for surface textures under unconstrained acquisition conditions, achieving a high level of haptic realism.