In this paper we propose a data-driven approach for multiple speaker tracking in reverberant enclosures. The speakers are uttering, possibly overlapping, speech signals while moving in the environment. The method comprises two stages. The first stage executes a single source localization using semi-supervised learning on multiple manifolds. The second stage, which is unsupervised, uses time-varying maximum likelihood estimation for tracking. The feature vectors, used by both stages, are the relative transfer functions (RTFs), which are known to be related to source positions. The number of sources is assumed to be known while the microphone positions are unknown. In the training stage, a large database of RTFs is given. A small percentage of the data is attributed with exact positions (namely, labelled data) and the rest is assumed to be unlabelled, i.e. the respective position is unknown. Then, a nonlinear, manifold-based, mapping function between the RTFs and the source positions is inferred. Applying this mapping function to all unlabelled RTFs constructs a dense grid of localized sources. In the test phase, this RTFs grid serves as the centroids for a (MoG) model. The MoG parameters are estimated by applying a recursive variant of the (EM) procedure that relies on the sparsity and intermittency of the speech signals. We present a comprehensive simulation study in various reverberation levels, including static and dynamic scenarios, for both two or three (partially) overlapping speakers. For the dynamic case we provide simulations with several speakers trajectories, including intersecting sources. The proposed scheme outperforms baseline methods that use a simpler propagation model in terms of localization accuracy and tracking capabilities.