We propose an unsupervised probabilistic framework for learning a human-centred representation of a person’s environment from first-person video. Specifically, non-geometric maps modelled as hierarchies of probabilistic place graphs and view graphs are learned. Place graphs model a user’s patterns of transition between physical locations whereas view graphs capture an aspect of user behaviour within those locations. Furthermore, we describe an implementation in which the notion of place is divided into stations and the routes that interconnect them. Stations typically correspond to rooms or areas where a user spends time. Visits to stations are temporally segmented based on qualitative visual motion. We describe how to learn maps online in an unsupervised manner, and how to localise the user within these maps. We report experiments on two datasets, including comparison of performance with and without view graphs, and demonstrate better online mapping than when using offline clustering.