Existing recommender systems rely on user and item representations in a fixed continuous low-dimensional latent space. To predict ratings, they use only an implicit feedback matrix, whereas user and item side information is ignored. Furthermore, they use the same arbitrary priors for the user and item latent vectors, reducing the ability of the model to identify the actual latent vectors. Currently, the latent parameters should be learned directly for every user and movie. This is problematic, as it would require both model retraining and learning the latent vector representations when users or items are added to the underlying dataset. To address these issues, we propose a two-step hybrid variational Bayesian autoencoder to characterize the uncertainty of predicted ratings. An encoder is first trained to map data vectors to the latent space so that the latent representations can be dynamically computed. Subsequently, we use the generative process of users and items with their priors as side-specific information to handle matrix sparsity and better learn their latent vectors. Finally, we consider stochastic variational inference to approximate the posterior density of intractable user-item latent vectors. Experiments conducted on two real-world datasets demonstrate the effectiveness of the proposed method compared with state-of-the-art methods.