Tracking pedestrians based on visual sensors has many diverse applications, among them autonomous driving. Besides obtaining high recall, maintaining the consistency of tracked trajectories during data association is one of the most crucial issues of any tracker. This issue has been tackled in the literature for some time, taking advantage of geometry cues for improving the pairwise matching of detections across consecutive frames. However, this idea has only been employed in a simple way and not thoroughly leveraged in existing studies, i.e., only 2D information is utilized that cannot help to completely understand the real-world geometry in 3D space. Motivated by this observation, in this paper, we present a new method called 3D-TLSR (3D pedestrian tracking using local structure refinement). We use stereo images and expand the idea of geometry cues into 3D space to improve the association of existing trajectories and new detections. We divide the assignment optimization into two steps: (1) determining trajectories whose assignments are strongly believed to be correct, which we call anchors and (2) employing geometry constraints between the anchors and their nearby trajectories in 3D space to improve the matching of less reliable assignments of the first step. In addition, we suggest a simple approach to compute and correct the velocity of a tracked person so that we can better recover missed detections. Experimental results on the well known KITTI tracking benchmark, the ETHMS data set, as well as a self-generated dataset show that our tracker yields comparable results to other state-of-the-art methods with (for KITTI) multi object tracking accuracy (MOTA) of 54.00, which is the best online result among all investigated approaches, multi object tracking precision (MOTP) of 73.03, which is the best of all reported values, and mostly tracked (MT) of 29.55, being the second-best result. On the ETHMS dataset, our approach obtains best results with large margins for recall, precision, and MT, while maintaining a reasonable low number of Id switches (IDs) and fragmentation (FG). These findings confirms the effectiveness of our proposed association method and velocity estimation approach.