In the last years, the face synthetic video generation has been rapid, and hyper-realistic forged videos are based on Deepfake techniques. It leads to a loss of trust in videos' content and makes it malicious by spreading forged videos on the internet. Until now, there are a few algorithms that have been suggested for detecting forged videos created by Deepfake techniques, but most of them based on analyzing or learning features on frames separately in a video. Those algorithms often pay less attention to Spatio-temporal features, so these algorithms' accuracy is usually not good. This paper proposes a 3-dimensional (3-D) convolutional neural network model that can learn Spatio-temporal features from an adjacent frame sequence in a video. Our proposed network's binary detection accuracy reached over 99% on the two largest benchmark datasets as Deepfake of FaceForensics++ and VidTIMIT datasets. The experimental results of the proposed method outperform state-of-the-art methods.