The study investigated how the use of protective face masks and language experience shape the production of listener-oriented clear speech. One L1 and one L2 English talker read sentences in a clear and conversational speaking style with and without a surgical mask. Formant trajectories between the onset and offset of two diphthongs, /aɪ/ and /eɪ/, were analyzed using Euclidean distance in the F1–F2 vowel space. The results showed that the distance between the onset /a/ and the offset /ɪ/ was larger when speech was produced without the mask and when speaking clearly. These modifications were larger for L1 talker compared to the L2 talker. The Euclidian distance for /eɪ/was only affected by speaking style. The results suggest that talkers produced hyperarticulated diphthongs characterized by larger formant movements in clear speech. The presence of a mask limited jaw movement for the diphthong containing the low vowel. Additionally, the L1 talker made larger articulatory modifications in response to the presence of a mask and listener-oriented clear speech compared to the talker with less extensive experience with the target language. These production patterns may be related to lower word recognition in noise for masked, conversational, and L2 speech found in previous work.