We present a hierarchical grid-based, globally optimal tracking-by-detection approach to track an unknown number of targets in complex and dense scenarios, particularly addressing the challenges of complex interaction and mutual occlusion. Frame-by-frame detection is performed by hierarchical likelihood grids, matching shape templates through a fast oriented distance transform. To allow recovery from misdetections, common heuristics such as nonmaxima suppression within observations is eschewed. Within a discretized state-space, the data association problem is formulated as a grid-based network flow model, resulting in a convex problem casted into an integer linear programming form, giving a global optimal solution. In addition, we show how a behavior cue (body orientation) can be integrated into our association affinity model, providing valuable hints for resolving ambiguities between crossing trajectories. Unlike traditional motion-based approaches, we estimate body orientation by a hybrid methodology, which combines the merits of motion-based and 3D appearance-based orientation estimation, thus being capable of dealing also with still-standing or slowly moving targets. The performance of our method is demonstrated through experiments on a large variety of benchmark video sequences, including both indoor and outdoor scenarios.