Abstract

The steady increase in the amount of information in digital format public on computer networks around the world, has caused the difficulty of users to find what they really need at any given time. To locate the required information, the Information Retrieval Systems were designed; whose functionalities, have a large number of configuration options and difficult to administer. Apache Nutch is a free spiders with big advantages for collection and finding information on the web; however lacks a system that enables visually configuration without using console commands and conducive working with multiple instances simultaneously. At the University of Informatics Sciences of The Havana, Cuba, Orion search engine was developed, but it has many disadvantages that prevent optimal performance of the process of setting up its tracking mechanism based on Nutch. In this paper are shown the essential elements taken into account in the implementation of a system that improves the usability and makes easy the work of administrators in the configuration tasks. The system implemented, has a set of features and functionalities that contribute, through the availability of web interfaces, increased control of configuration changes and streamlining the process; also providing information on the settings, that previously impossible or difficult to obtain. Keywords-Apache Nutch, configuration, Information Retrieval System, Orion, web interface. Digital Object Identifier (DOI): http://dx.doi.org/10.18687/LACCEI2015.1.1.027 ISBN: 13 978-0-9822896-8-6 ISSN: 2414-6668 Configuration system for the Apache Nutch spider: practical application in the Orion search engine Yulio Aleman Jimenez, Yoniel Jorge Thomas Sosa, Aylin Estrada Velazco, Eyeris Rodriguez Rueda University of Informatics Sciences, La Habana, Cuba. yulioaj@uci.cu, yjthomas@uci.cu, erueda@uci.cu, avelazco@uci.cu Abstract– The steady increase in the amount of information in digital format public on computer networks around the world, has caused the difficulty of users to find what they really need at any given time. To locate the required information, the Information Retrieval Systems were designed; whose functionalities, have a large number of configuration options and difficult to administer. Apache Nutch is a free spiders with big advantages for collection and finding information on the web; however lacks a system that enables visually configuration without using console commands and conducive working with multiple instances simultaneously. At the University of Informatics Sciences of The Havana, Cuba, Orion search engine was developed, but it has many disadvantages that prevent optimal performance of the process of setting up its tracking mechanism based on Nutch. In this paper are shown the essential elements taken into account in the implementation of a system that improves the usability and makes easy the work of administrators in the configuration tasks. The system implemented, has a set of features and functionalities that contribute, through the availability of web interfaces, increased control of configuration changes and streamlining the process; also providing information on the settings, that previously impossible or difficult to obtain. The steady increase in the amount of information in digital format public on computer networks around the world, has caused the difficulty of users to find what they really need at any given time. To locate the required information, the Information Retrieval Systems were designed; whose functionalities, have a large number of configuration options and difficult to administer. Apache Nutch is a free spiders with big advantages for collection and finding information on the web; however lacks a system that enables visually configuration without using console commands and conducive working with multiple instances simultaneously. At the University of Informatics Sciences of The Havana, Cuba, Orion search engine was developed, but it has many disadvantages that prevent optimal performance of the process of setting up its tracking mechanism based on Nutch. In this paper are shown the essential elements taken into account in the implementation of a system that improves the usability and makes easy the work of administrators in the configuration tasks. The system implemented, has a set of features and functionalities that contribute, through the availability of web interfaces, increased control of configuration changes and streamlining the process; also providing information on the settings, that previously impossible or difficult to obtain. Keywords-Apache Nutch, configuration, Information Retrieval System, Orion, web interface.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.