We are living in an era in which daily interaction between individuals and businesses involves sending, uploading, and sharing videos as a means of communication and advertising. However, many users are unaware of the risks associated with opening a malicious video file, it is thus no surprise that cyber-criminals have taken advantage of this situation and adopted this attack vector in recent years. MP4 is one of the most commonly used video formats, and its properties make it well-suited for software vulnerability exploitation across multiple platforms, which can ultimately lead to a cyberattack. Due to their deterministic, signature-based technique, antivirus software solutions are limited in their ability to detect unknown malware, let alone zero-day attacks. Machine learning (ML) algorithms have been effective in detecting known and unknown malware across various file formats, domains, and platforms. ML algorithms’ performance relies heavily on the feature extraction methodology. However, to the best of our knowledge, there is no designated and specialized feature extraction methodology for MP4 files which generates a set of features for the task of unknown MP4 file malware detection. In this paper, we present three innovative and efficient feature extraction methodologies for unknown MP4 file malware detection. Two of them are file structure-based and one is knowledge-based. The methodologies are evaluated in a series of five experiments using six ML algorithms and 177 different datasets which represent different configurations of feature extraction, representation, and selection. The datasets are based on a representative collection of 6,229 files − 5,066 benign (∼81.3 %) files and 1,163 malicious files (∼18.7 %). The first three experiments demonstrate the methodologies’ discrimination and generalization capabilities across multiple configurations, in terms of known and unknown MP4 file malware detection. The fourth experiment shows that applying principal component analysis (PCA) on the features suggested by the methodologies can improve time and space complexity and feature resilience while maintaining strong detection and generalization capabilities. In the fifth experiment, the methodologies’ best performing configuration is compared to state-of-the-art, generic feature extraction methodologies, such as n-grams, MinHash, and representation and transfer learning (using a CNN), in the task of unknown MP4 file malware detection. The results show that our best performing configuration outperforms all other state-of-the-art feature extraction methodologies with an AUC, TPR, and FPR of 0.9951, 0.976, and 0.0 respectively.