Abstract
Recently, websites containing harmful information such as gambling, illegal drugs, pornography, and prostitution are exposed to the public. These harmful sites cause damage to copyright holders and related service industries, and cause various social problems. In this paper, we propose an image-based harmful site identification system using OCR and Average Hash techniques to identify and classify harmful sites. This system uses the characteristic that most gambling banner advertisements repeatedly use similar images, and analyzes the similarity with the average hash value of the banner advertisement image. And using Easy OCR, it determines whether the phrase written in the banner advertisement is harmful or not. To evaluate the performance of the proposed idea, a program was created to determine harmfulness by collecting and analyzing the site"s banner advertisement image when the site name was entered, and it was confirmed that the discrimination accuracy was 84%. In addition, since the information collected while running the program is stored in the database, trends in harmful sites can be identified. This will be effectively used to search for harmful sites that are expected to occur.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: THE TRANSACTION OF THE KOREAN INSTITUTE OF ELECTRICAL ENGINEERS P
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.