Study of the four covert kinematic variants of North American English flaps can provide insight into questions in speech motor control and articulation-acoustics relations [Derrick and Gick, Can. J. Linguist. 56(3), 307–319 (2011)]. These variants are typically labeled by human annotators from ultrasound video, which is time-consuming and labor-intensive. In this study, we present an automatic classification method, taking as basis data optical flow fields over a series of ultrasound frames flanking the flap; fields are calculated using a method tailored to ultrasound video [Moisik et al., JIPA 44(1), 21–58 (2014)]. Two classifiers are compared: support vector machines, which learn an optimal linear separator between labeled class instances; and simple recurrent neural networks [Elman, Cognit. Sci. 14(2), 179–211], which operate recursively over sequences of data, and whose state at any timepoint is calculated based only on the state at the preceding timepoint and the current input field. We train these classifiers on human-labeled flap tokens and test classification performance against a held-out subset of the labeled tokens. We go on to discuss the general applicability of this method for disambiguating covertly different lingual articulatory events.