A Survey of Bio Inspired Algorithms for Web Information Extraction and Optimization for Big Data Analytics

Cse, Gsssietw, Mysore, India ,Manjunatha Swamy C,Dr S Meenakshi Sundaram

doi:10.35940/ijeat.b2011.1210220

Cse, Gsssietw, Mysore, India , Manjunatha Swamy C + Show 1 more

Open Access

https://doi.org/10.35940/ijeat.b2011.1210220

Copy DOI

Abstract

Information extraction is systematic process of extracting structured information from documents which has both unstructured and semi structured data set. Data available over the web is unstructured which is processed and delivered that may be challenging due to massive data over web. Bigdata analytics approach is used in the computation field where massive data is managed and processed as information. Data from various sources like industries, institutes are processed using algorithms in efficient means employing web of things or Internet of things used to mine such a large data. Bio inspired algorithms have evolved from application of heuristic approaches to meta-heuristic and hyper-heuristic methodologies. Bio inspired techniques are categorized into human inspired algorithms, Swarm Intelligence algorithms, evolutionary algorithms and ecology based algorithms. Genetic algorithms are purely heuristic in nature and are employed for computation and extracting information and from big data. This improves the computation speed effectively for extracting web related information as evolutionary algorithm resolves information extraction problems. The Ant colony and Particle Swarm Intelligence algorithms are of meta-heuristic in nature. The Cuckoo search, Artificial Bee Colony, Firefly algorithm and Bat algorithms are of hyper heuristic in nature i.e., they employ a combination of methods. Web information extraction using bio inspired concepts and genetic operators increases efficiency, capability to search particular information in massive data in web. Some of the tools that are available for data extraction and mining are DataMelt, Apache Mahout, Weka, Orange and Rapid Miner for enhancing web data extraction efficiency. This survey on bio inspired methodologies can be extended to parameter tuning and controlling is another big strategy that can be implemented, in addition to convergence speed up.

Highlights

Web information extraction in massive warehouse is not easy
GENETIC ALGORITHM Bat algorithm [10] is a Meta heuristic algorithm used in Optimization Algorithm categorized under Bio inspired behavior concept in bats
Bat Algorithm for web Information Extraction The flow chart for implementation of Bat Algorithm is given in figure 3.2 below. step 1: Assign position, velocity and parameters with frequency step 2: Increment velocity and location each time based on equation step 3: Increment position and find fitness value in locating prey based on distance step 4: Based on loudness factor and pulse rate identify or avoid obstacles to get best position step 5: algorithm is done and go back to step 2

Summary

INTRODUCTION

Web information extraction in massive warehouse is not easy. Information collected from all variety of sources is in digital form, which may be structured, unstructured or semi structured. Algorithms are used to process data and extract information. Genetic Algorithm [5] is an evolutionary algorithm which consists of functions that represent data as a model to handle efficiency and scalability issues, with fitness function that

Optimization

LIMITATIONS

WEB INFORMATION EXTRACTION AND OPTIMIZATION USING BAT ALGORITHM AND

Genetic algorithm for web Information Extraction

CONCLUSION & FUTURE WORK