Variational autoencoder architectures have the potential to develop reduced-order models for chaotic fluid flows. We propose a method for learning compact and near-orthogonal reduced-order models using a combination of a β-variational autoencoder and a transformer, tested on numerical data from a two-dimensional viscous flow in both periodic and chaotic regimes. The β-variational autoencoder is trained to learn a compact latent representation of the flow velocity, and the transformer is trained to predict the temporal dynamics in latent-space. Using the β-variational autoencoder to learn disentangled representations in latent-space, we obtain a more interpretable flow model with features that resemble those observed in the proper orthogonal decomposition, but with a more efficient representation. Using Poincaré maps, the results show that our method can capture the underlying dynamics of the flow outperforming other prediction models. The proposed method has potential applications in other fields such as weather forecasting, structural dynamics or biomedical engineering.