The current study extends traditional perceptual high-variability phonetic training (HVPT) in a foreign language learning context by implementing a comprehensive training paradigm that combines perception (discrimination and identification) and production (immediate repetition) training tasks and by exploring two potentially enhancing training conditions: the use of non-lexical training stimuli and the presence of masking noise during production training. We assessed training effects on L1-Spanish/Catalan bilingual EFL learners' production of a difficult English vowel contrast (/æ/-/ʌ/). The participants (N=62) were randomly assigned to either non-lexical (N=24) or lexical (N=24) training and were further subdivided into two groups, one trained in noise (N=12) and one in silence (N=12). An untrained control group (N=14) was also tested. Training gains, measured through spectral distance scores (Euclidean distances) with respect to native speakers' productions of /æ/ and /ʌ/, were assessed through delayed word and sentence repetition tasks. The results showed an advantage of non-lexical training over lexical training, detrimental effects of noise for participants trained with nonwords, but not for those trained with words, and less accurate production of vowels elicited in isolated words than in words embedded in sentences, where training gains were only observable for participants trained with nonwords.