How do we know when a reinforcement learning policy needs to adapt? In non-stationary environments, agents must adapt and learn in environments that change dynamically. We propose a finite-horizon model-free solution using a hierarchical learning structure with fuzzy systems. The higher-level learning policy advises the lower-level policy when to start and stop learning based on the temporal differences calculated within the lower-level. Major differences in the temporal difference of each action produced by an agent may indicate environment change. This structure is tested with multi-agent differential games in both the cooperative and competitive aspect. Results show that this method is quick to notice and adapt the policy within relatively few learning episodes.