Magnetic resonance imaging is often used for the speech production research. One important aspect is to segment the vocal tract in the image sequence. However, during the deformation of the vocal tract, it is highly possible that the landmark deviates from its correct position and is in need of reinitialization. During the natural speech production of the subject, the vocal tract may return to its initial position during the pause at the end of a sentence. This could be used to reset the landmark to its correct position. In order to determine the pause in the image sequence, we used similarity based measurements to compare the similarity between the current frame and the first frame, which is the beginning of a sentence and the vocal tract is at the rest position. These measurements include Structural Similarity (SSIM), Complex Wavelet Structural Similarity (CW-SSIM), Visual Information Fidelity in Pixel (VIFP), Peak Noise to Signal Ratio (PSNR), etc. We found CW-SSIM outperformed the other methods. We calculated the similarity measurements and they varied periodically during the speech. CW-SSIM returned to a maximum of around 0.9, which indicated that the vocal tract returned to its initial position. The rest of the similarity measurements returned to a maximum value greatly deviated from 1, which indicated that the CW-SSIM was the best candidate.