The multiple object tracking (MOT) task has been a useful tool for studying the deployment of limited-capacity visual resources over time. Since it involves sustained attention to multiple objects, this task is a promising model for real-world visual cognition. However, real-world tasks differ in two critical ways from standard laboratory MOT designs. First, in real-world tracking, it is unusual for the set of tracked items to be identified all at once and to remain unchanged over time. Second, real-world tracking tasks may need to be sustained over a period of minutes, and not mere seconds. How well is MOT performance maintained over extended periods of time? In four experiments, we demonstrate that observers can dynamically "juggle" objects in and out of the tracked set with little apparent cost, and can sustain this performance for up to 10 min at a time. This performance requires implicit or explicit feedback. In the absence of feedback, performance tracking drops steadily over the course of several minutes.