Introduction Nowadays the web is one of the most important sources of educational material where students and teachers have a large amount of information at their disposal. For this information retrieval process, people use search engines which, unfortunately in many cases, do not return the desired information or return too many web pages. Learning objects are intended to help with the storage, classification, and reuse of educational resources. A Learning Object (LO) is any digital resource that can be reused to support learning Editor: Janice Whatley (Wiley, 2002). LOs can be used by a student who wants to learn a subject or may be used by a teacher who wants to prepare materials for his/her class. LOs are described with metadata usually in the standard IEEE LOM (Learning Object Metadata: http://ltsc.ieee.org/wg12) and they are stored in different repositories. Examples of such repositories are FLOR (www.laclo.org), Ariadne (www.ariadne-eu.org) and OER Commons (www.oercommons.org). There other standards such as the Dublin Core Metadata Initiative, or DCMI (dublincore.org), which is an open organization supporting innovation in metadata design and best practices across the metadata ecology. The Dublin Core Metadata is not focused on educational metadata but it maintains a number of formal and informal relationships with different standards bodies and it is widely used. Other metadata standards related with educational metadata are: IMS Learning Object Metadata IMS LOM (http://www.imsglobal.org/metadata), Canadian Core Learning Resource Metadata Protocol CanCor (http://www.cancore.ca) and UK Learning Object Metadata Core UK LOM (http://www.cetis.ac.uk/profiles/uklomcore/uklomcore v0p2 may04.doc). We decided to use IEEE LOM because it is one of the most used in the community of learning objects. Users can retrieve LOs through searches in web repositories. Thus, the importance of high quality metadata is key for a successful retrieval. Recommender systems, based on metadata information and user profiles, arise to help people to retrieve the resources that are most appropriate to user's needs and preferences (Casali, Deco, Bender, & Gerling, 2012). Nevertheless, preparing learning resources with suitable metadata is labor-intensive and, consequently, there is a lack of quality information in these metadata. Michael Sonntag (2004) analyzes the importance of metadata for learning objects, as these resources may be reused often and possibly in different contexts. Some problems that he points are the lack of metadata, the diversity of standards, and the search engine support. Thus, the development of automatic/semi-automatic extraction systems seems to be a very important step towards solving this problem. Up to now, there are not many works on automatic metadata extraction. Each of the existent tools for metadata extraction has its own objectives, architecture and uses different techniques. For instance, some extractors systems can be seen in Alfano, Lenzitti, and Visalli (2007), Li, Dorai, and Farrell (2005), Motz et al. (2009), Pire, Espinase, Casali, and Deco (2011), and Tang (2007). In addition, there exist some metadata editors. For example, the Eureka project (http://eureka.ntic.org) is an initiative that provides a collective catalog of teaching and learning resources gathered by various organizations involved in the production of ITC educational resources. Eureka's shell is based on open source code. The data can be federated with other repositories built on a LOM application profile. Another project is Advanced Learning Object Hub Application ALOHA 1.3 (http://aloha.netera.ca/). The Learning Commons at the University of Calgary has done extensive research and development of tools and techniques to deal with workflow productivity issues related to content repurposing and metadata indexing. ALOHA's flexible interface is friendly for amateur users and customizable for the professional indexer. …
Read full abstract