Sir, It was with great interest that we read the article titled “Feasibility study to assess clinical applications of 3T cine MRI coupled with synchronous audio recording during speech in evaluation of velopharyngeal insufficiency in children” by Sagar and Nimkin [1]. We believe dynamic real-time MRI will become a prevalent tool in studying velopharyngeal closure and, when associated with detailed anatomical scans, should provide added clinical information compared to the current imaging modalities, videofluoroscopy and nasendoscopy. In this regard, the authors must be commended for their detailed comparison of real-time MRI with those techniques. However, it is worth highlighting that contrary to what is stated in the article, many studies of velopharyngeal closure using real-time MRI have been published in the imaging literature, most of them with synchronised audio recording. Studies focussing specifically on velopharyngeal closure include but are not limited to those by Beer et al. [2], which compared the technique to X-ray videofluoroscopy, Bae et al. [3] at 3 T, and Scott et al. [4], which included data at both 1.5 and 3 T. The last two studies were carried out with simultaneous audio recording. There are far too many speech studies using real-time MRI to mention in a short letter, although the work conducted at the University of Southern California both on acquisition and analysis [5, 6] is worth highlighting. We strongly encourage the interested reader to refer to the recent review of the field [7]. When studying velopharyngeal closure, the dynamic frame rate is a key issue, and we are concerned that only two frames per second (fps) were used in the study by Sagar and Nimkin [1]. Although there is no doubt this is sufficient to study sustained phonation, it appears to be insufficient for speech studies. All the previously mentioned studies [2–6] were acquired at substantially faster rates (5–25 fps). In our clinical experience, for various speech samples in normal phonation all closure events are detected at rates around 10 fps [8]; however, some are already missed at 5 fps. A lower frame rate will increase both blurring and the number of missed closure events; it could potentially lead to an incorrect diagnosis of velopharyngeal insufficiency if closure is short and not sampled. Higher frame rates (e.g., 30 fps [9] or more than 100 fps [10]) are achievable and can be required for linguistics and co-articulatory events studies. However, they are obtained using non-cartesian acquisitions, which have been recently developed and are not necessarily available on standard scanners. Furthermore, they usually rely on delayed-reconstruction, which is a limiting factor to conduct an interactive clinical speech study as would be desirable for velopharyngeal closure assessment. Real-time MRI of speech is a very active field of research, and new publications in this area are a welcome addition to the body of knowledge. However, a consensus on best practise for acquisition methodology is still to emerge, and until such time, we would recommend newcomers to err on the side of caution and try to obtain the best spatial-temporal resolution compromise that can be achieved with their chosen acquisition method.