Abstract

Machine learning methods are widely used to identify common, repeatedly occurring bugs and code vulnerabilities. The performance of a machine-learned model is bounded by the quality and quantity of training data and the model’s capability in extracting and capturing the essential information of the problem domain. Unfortunately, there is a storage of high-quality samples for training code vulnerability detection models, and existing machine learning methods are inadequate in capturing code vulnerability patterns.We present Developer,11Developer = Detecting codEVulnerabilitiEs at the Large scale by learning from OPen sourcERepositories. a novel learning framework for building code vulnerability detection models. To address the data scarcity challenge, Developer automatically gathers training samples from open-source projects and applies constraints rules to the collected data to filter out noisy data to improve the quality of the collected samples. The collected data provides many real-world vulnerable code training samples to complement the samples available in standard vulnerable databases. To build an effective code vulnerability detection model, Developer employs a convolutional neural network architecture with attention mechanisms to extract code representation from the program abstract syntax tree. The extracted program representation is then fed to a downstream network – a bidirectional long–short term memory architecture – to predict if the target code contains a vulnerability or not. We apply Developer to identify vulnerabilities at the program source-code level. Our evaluation shows that Developer outperforms state-of-the-art methods by uncovering more vulnerabilities with a lower false-positive rate.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.