Abstract

Abstract 논문투고일:2012년 07월 27일 논문수정완료일:2012년 09월 14일 논문게재확정일:2012년 10월 03일* 이 논문은 2012학년도 충북대학교 학술연구지원사업의 연구비 지원, 그리고 2012학년도 경북대학교 학술연구비에 의하여 연구되었음.** 경북대학교 IT대학 컴퓨터학부*** 충북대학교 전자정보대학 컴퓨터공학과, 교신저자It is becoming hard to maintain web applications because of hig h complexity and duplication of web pages. However, most of research about code clone is focusing on code hunks, an d their target is limited to a specific language. Thus, we propose GSIM, a language-independent statistical approach to detect similar pages based on scarcity and frequency of customized tokens. The tokens, which can be obtained from pa ges splitted by a set of given separators, are defined as atomic elements for calculating similarity between two pages . In this paper, the domain definition for web applications and algorithms for collecting tokens, making matrics, calculating similarity are given. We also conducted experiments on open source codes for evaluation, with our GSIM tool. The results show the applicability of the proposed method and the effects of parameters such as threshold, toughne ss, length of tokens, on their quality and performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.