Abstract

Every user wants to access their expected contents from legitimate web pages on the Internet by entering key phrases into search engines. Fortunately, crawler or spider (also known as web robots) retrieves the precise contents along with more number of vulnerable data, due to the crawler’s page navigations system. Search engines are simply bringing their responsive web pages based on assigned indexing values for all the visited links, without identifying hidden intrusion contents and web links on all the malicious pages. In the previous attacking era, spammers used (Spam) text for their attacks; later, many text-based filtering tools came into the picture to analyze spam text. Since images have very complicated features to extract its contents, after spam text identification, hackers nurtured their attacking methods based on the images, and they started to embed their spam contents into the image and spread it on victim’s web pages or email id. These images are called spam images. Images are scanned for its text extraction by Optical Character Recognition (OCR) system, and then, the mined contents are matched with spam text databases for spam text identifications. Based on threshold values of matched contents, spam images are identified and applied to the remedial actions. Subsequent failures of said techniques, hackers started to attack targeted victim’s data and system with the help of non-spam images. In which, they simply imbed their malicious web links and erroneous contents into the images, and those images are placed at the legitimate web pages in the form of some advertisements or stimulating user’s desires. Navigating to these intruded sites causes to DoS attacks, data and security breaching of a victim’s system. This paper is going to discuss web robots and its architecture, web content analysis and identification of intrusive substances of bogus images on web page contents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call