Abstract
Vulnerability detection is an effective means to maintain cyberspace security. Machine learning methods have risen much attention in software security due to their advantage of accuracy and automation. However, current researches mainly focus on in-domain vulnerability detection where the training data and test data belong to the same domain. Due to application scenarios, coding habits, and other factors, vulnerabilities in different software projects may obey different probability distributions. This discrepancy compromises the performance of machine learning methods when they are applied to a brand-new project. To address this cold start problem, we propose a cross-domain vulnerability detection framework using graph embedding and deep domain adaption (VulGDA). It works in a variety of cross-domain fashions, including the Zero-Shot fashion that no labeled data in the target domain is available for training. VulGDA is decomposed to graph embedding and domain adaptation. At the graph embedding stage, we transform the samples in source code into graph representations where elements are directly concatenated according to their syntactic and semantic relationships. Then, we aggregate information from neighbors and edges defined in the graph into real-valued vectors. By graph embedding, VulGDA extracts comprehensive vulnerability features and compromises the challenge of long-term dependency. Aiming at the discrepancy between training data and test data, domain adaption is used to train a feature generator. This feature generator maps the graph embedding to a “deep” feature that is discriminative for vulnerability detection, and invariant to the shift between domains. We perform a systematic experiment to validate the effectiveness of VulGDA. The results show that combining graph embedding and deep domain adaptation promotes VulGDA's performance in cross-domain vulnerability detection. Compared with the state-of-the-art methods, our method has better performance under the cold start condition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.