Facial mesh tracking enables the production of topologically consistent 3D facial meshes from stereo video input captured by calibrated cameras. This technology is an integral part of many digital human applications, such as personalized avatar creation, audio-driven 3D facial animation, and talking face video generation. Currently, most facial mesh tracking methods are built on computer graphics techniques, which involve complex procedures and often necessitate human annotation within pipelines. As a result, these approaches are difficult to implement and hard to generalize across various scenarios. We propose a backpropagation-based solution that formulates facial mesh tracking as a differentiable optimization problem called the BPMT. Our solution leverages visual clues extracted from the stereo input to estimate vertex-wise geometry and texture information. The BPMT is composed of two steps: automatic face analysis and mesh tracking. In the first step, a range of visual clues are automatically extracted from the input, including facial point clouds, multi-view 2D landmarks, 3D landmarks in the world coordinate system, motion fields, and image masks. The second step can be viewed as a differentiable optimization problem, with constraints comprising stereo video input and facial clues. The primary objective is to achieve topologically consistent 3D facial meshes across frames. Additionally, the parameters to be optimized encompass the positions of free-form deformed vertices and a shared texture UV map. Furthermore, the 3D morphable model (3DMM) is introduced as a form of regularization to enhance the convergence of the optimization process. Leveraging the fully developed backpropagation software, we progressively register the facial meshes to the recorded object, generating high-quality 3D faces with consistent topologies. The BPMT requires no manual labeling within the pipeline, making it suitable for producing large-scale stereo facial data. Moreover, our method exhibits a high degree of flexibility and extensibility, positioning it as a promising platform for future research in the community.