Abstract
Machine learning (ML) classifiers used for malware detection typically employ numerical representations of the content of each file when making malicious/benign determinations. However, there is also relevant information that can be gleaned from the context in which the file was seen which is often ignored. One source of contextual information is the file's location on disk. For example, a malicious file masquerading as a known benign file (e.g., a Windows system DLL) is more likely to appear suspicious if the detector can intelligibly utilize information about the path at which it resides. Knowledge of the file path information could also make it easier to detect files which try to evade disk scans by placing themselves in specific locations. File paths are also available with little overhead and can seamlessly be integrated into a multi-view static ML detector, potentially yielding higher detection rates at very high throughput and minimal infrastructural changes. In this work, we propose a multi-view deep neural network architecture, which takes feature vectors from the PE file content as well as corresponding file paths as inputs and outputs a detection score. We perform an evaluation on a commercial-scale dataset of approximately 10 million samples - files and file paths from user endpoints serviced by an actual security vendor. We then conduct an interpretability analysis via LIME modeling to ensure that our classifier has learned a sensible representation and examine how the file path contributes to change in the classifier's score in different cases. We find that our model learns useful aspects of the file path for classification, resulting in a 26.6% improvement in the true positive rate at a 0.001 false positive rate (FPR) and a 64.6% improvement at 0.0001 FPR, compared to a model that operates on PE file content only.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.