BackgroundIt remains unclear how embodiment and conceptual metaphor play a role in second language (L2) lexical tone learning. AimsThe present study aimed to examine the roles of pitch gesture production (PP), pitch feature observation (PO), and word-picture association (WA) approach in L2 lexical tone learning. SampleParticipants were 90 undergraduate students with Mandarin as their native language. MethodsParticipants learned Thai lexical tones via the three approaches and completed tone discrimination, tone identification, and word-picture matching tasks. ResultsThe PP performed better than the PO and WA in discriminating between and identifying specific tones. The PP was more accurate than the WA in word-picture matching. ConclusionsThe pitch gesture production's embodiment was superior to the pitch feature observation's conceptual metaphor in learning L2 lexical tones. However, its role was affected by lexical tones' pitch features, test tasks, and learners' tonal experience.