Distributed acoustic sensors (DAS) are effective apparatuses that are widely used in many application areas for recording signals of various events with very high spatial resolution along optical fibers. To properly detect and recognize the recorded events, advanced signal processing algorithms with high computational demands are crucial. Convolutional neural networks (CNNs) are highly capable tools to extract spatial information and are suitable for event recognition applications in DAS. Long short-term memory (LSTM) is an effective instrument to process sequential data. In this study, a two-stage feature extraction methodology that combines the capabilities of these neural network architectures with transfer learning is proposed to classify vibrations applied to an optical fiber by a piezoelectric transducer. First, the differential amplitude and phase information is extracted from the phase-sensitive optical time domain reflectometer (Φ-OTDR) recordings and stored in a spatiotemporal data matrix. Then, a state-of-the-art pre-trained CNN without dense layers is used as a feature extractor in the first stage. In the second stage, LSTMs are used to further analyze the features extracted by the CNN. Finally, a dense layer is used to classify the extracted features. To observe the effect of different CNN architectures, the proposed model is tested with five state-of-the-art pre-trained models (VGG-16, ResNet-50, DenseNet-121, MobileNet, and Inception-v3). The results show that using the VGG-16 architecture in the proposed framework manages to obtain a 100% classification accuracy in 50 trainings and got the best results on the Φ-OTDR dataset. The results of this study indicate that pre-trained CNNs combined with LSTM are very suitable to analyze differential amplitude and phase information represented in a spatiotemporal data matrix, which is promising for event recognition operations in DAS applications.