Abstract Variational Auto-Encoder (VAE) is a handy and computationally friendly Bayesian tool for Inverse Reinforcement Learning (IRL) problems, with the native setting of absent reward functions in a Markov Decision Process (MDP). However, recent works mainly deal with single reward, which turn out to be insufficient for complex dynamic environments with multiple demonstrators of various characteristics (hence multiple reward functions). This paper extends the dimensionality of reward (from ℝ to ℝ K ) by incorporating a latent embedding and clustering step on top of a scalable Bayesian IRL model, which enhances her applicability to multi-reward scenarios. We introduce our method, Clustered Autoencoded Variational Inverse Reinforcement Learning (CAVIRL), which is able to approximate multiple posterior reward functions and learn the corresponding policies for experts of various characteristics and skills. As a by-product, the proposed model also thrives to determine the number of clusters K on her own, as opposed to the competing multi-reward imitation learning models that require K to be prespecified. We trained the proposed model within a grid world with multiple types of players, where we achieved 100% correctness in determining the number of players’ types and 80%-83.9% match between the model-learned policies and the players’ demonstrations from the data.