Abstract

Software engineering corpora often contain domain-specific topics and linguistic patterns. Popular text analysis tools are not specially designed to accommodate such topics and patterns. In this paper, we introduce ALPACA, a novel, customizable text analysis framework. The main function of ALPACA is to analyze topics and their trends in a text corpus. It allows users to define a topic with a few initial domain-specific keywords and expand it into a much larger set of similar topic words. This new set of words can be further expanded into a set of self-contained phrases to describe the topic more precisely. ALPACA extracts those phrases by matching input sentences with linguistic patterns, which are long sequences mixing both specific words and part-of-speech tags frequently appeared in the corpus. In this paper, we demonstrate using ALPACA to continue analyzing CVE security reports and detect a new topic of mobile device's vulnerability. Youtube link: https://wwwyoutube.com/watch?v=UTcMYb2o1pU

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.