Abstract
Web resources have been gaining popularity as providers of relevant data, whether those stored in datasets or those resulting from the execution of complex functions such as the alignment of protein sequences. Although the discovery of web resources has been largely studied, it is still a challenging research task due to the high dependency current search engines have on the characteristics of the available metadata. In some domains like Life Sciences, this dependency becomes even worse due to the heterogeneity of data. Current web resource registries allow users to search for resources that fulfill their information needs. The discovery in these registries is mainly based on the use of well-defined metadata, which is usually limited and very specific, and on the string matching of the user's query keywords, which is hampered by the heterogeneity of data. The main objective of this thesis is to assist the users in the discovery of the most appropriate resources for their information needs, specifically in the Life Sciences domain. The achievement of this objective implies addressing the main limitations of current web resource registries. Firstly, web resource discovery is driven by the user's requirements and, therefore, the precision of its results depends on how well the user's information needs are described in the requirements specification. Thus, rich requirements specifications are assumed to obtain more precise results. In the proposed approach, the requirements specification consists of a rich description of both the functionality and relevant features of the required resource. Additionally, discovery parameters are customizable by the users in order to improve the accuracy of the process. Secondly, the discovery depends heavily on the characteristics of the resources metadata. In many registries, resources are described with well-defined metadata, e.g., categories, and with textual descriptions, which provide richer information but harder automatic processing. In order to alleviate this dependency, this thesis proposes a normalization process which addresses the heterogeneity of data, and automatically identifies relevant information implicitly described in the resources metadata. Then, the discovery of web resources considers the normalized data, reducing words mismatchings, alleviating the problem of using different vocabularies, and improving the characterization of resources. Finally, whereas current registries provide the user with a list of resources without any information about their relevance to her requirements, in the proposed approach the user is prompted with a ranked list of resources according to the fulfillment of her information needs, and to the accomplishment of the user-defined features. In this way, the system assists the user until the end of the discovery process, providing her information relevant to the selection of the best suited resource. The experimental evaluation performed on each phase of the discovery method demonstrates that the proposed techniques obtain good results. Moreover, the discovery method has been implemented as part of BioUSeR, an online tool for the discovery of Life Sciences web resources. In BioUSeR, the results of each phase of the discovery process are visualized, and the parameters and the data involved in the process are easily customized by the user. We have used BioUSeR to demonstrate the usefulness of our approach using real usage examples.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.