Abstract

In this issue, Markey presents the first part of a two-part article that reviews 25 years of published research findings on end-user searching in online information retrieval (IR) systems. The author seeks to answer the following questions: What characterizes the queries that end users submit to online IR systems? What search features do people use? What features would enable them to improve on the retrievals they have in hand? What features are rarely used? What do end users do in response to the system's retrievals? Are end users satisfied with their online searches? Wacholder et al. describe a procedure for quantitative evaluation of interactive question-answering systems, and illustrate the procedure with application to the High-Quality Interactive Question-Answering (HITIQA) system. The objectives of the study were (1) to design a method to realistically and reliably assess interactive question-answering systems by comparing the quality of reports produced using different systems, (2) to conduct a pilot test of the method, and (3) to perform a formative evaluation of the HITIQA system. The authors conclude that the method, which uses a small number of subjects and does not rely on predetermined relevance judgments, measures the impact of system change on work produced by users, and that the method can therefore be used to compare the product of interactive systems that use different underlying technologies. Bornmann and Daniel investigate the influence of number of articles reporting the results of a single study on reception in the scientific community. The data set consisted of 96 applicants for a research fellowship from the Boehringer Ingelheim Fonds, an international foundation for the promotion of basic research in biomedicine. The applicants reported all articles that they had published within the framework of their doctoral research projects. On this single project, the applicants had published from 1 to 16 articles. The results of a regression model with an interaction term show that the practice of multiple publication of research study results does in fact lead to greater reception of the research (as indicated by total citation counts) in the scientific community. However, reception is dependent on the length of articles; the longer the article, the more total citation counts increase with the number of articles. Dominich and Kiezer begin by discussing a recognized problem with the vector space model of information retrieval; i.e., that the model does not actually follow from the mathematical concepts on which it has been claimed to rest. The authors then propose a solution to this problem. First, the concept of retrieval is defined based on mathematical measure theory. Then, retrieval is particularized using fuzzy set theory. As a result, the retrieval function is conceived as the cardinality of the intersection of two fuzzy sets. This view makes it possible to build a connection to linear spaces. It is shown that the classical and generalized vector space models, as well as the latent semantic indexing model, gain a correct formal background with which they are consistent. At the same time, it becomes clear that the inner product is not a necessary ingredient of the vector space model, and hence information retrieval (IR). The Principle of Object Invariance is introduced to handle this situation. This view makes it possible to consistently formulate new retrieval methods: in linear space with general basis, entropy-based, and probability-based. It is also shown that IR may be viewed as integral calculus and may be conceived as an application of mathematical measure theory. Markey presents the second part of a two-part article that reviews 25 years of published research findings on end-user searching in online information retrieval (IR) systems. In this article, the author picks up the discussion of research findings about end-user searching in the context of current IR models. These models demonstrate that IR is a complex event, involving changes in cognition, feelings and/or events during the information seeking process. The author challenges IR researchers to design new studies of end-user searching, collecting data not only on system-feature use, but on multiple search sessions and controlling for variables such as domain knowledge expertise and expert system knowledge. Because future IR systems designers are likely to improve the functionality of online IR systems in response to answers to the new research questions posed in this article, the author concludes with advice to these designers about retaining the simplicity of online IR system interfaces. Kari and Hartel discuss lower and higher contexts for information phenomena, and argue that there is a need for a more concerted research effort in the latter sphere. The discipline of information science has traditionally favored lower contexts—like everyday life or problem solving—that are neutral or even negative by nature. In contrast, the neglected higher things in life are pleasurable or profound phenomena, experiences, or activities that transcend the daily grind. The authors outline a contextual research area in information studies to address higher things from the perspective of information. They conclude that optimal functioning requires bringing the lower and higher sides to balance in information science. Coleman discusses the fall from citation grace of the Journal of Education for Library and Information Science (JELIS) in terms of impact factor and declining subscriptions. Journal evaluation studies in library and information science based on subjective ratings are used to show the high rank of JELIS during the same period (1984–2004) and to explain why impact factors and perceptual ratings either singly or jointly are inadequate measures for understanding the value of specialized, scholarly journals such as JELIS. This case study was also a search for bibliometric measures of journal value. Three measures (journal attraction power, author associativity, and journal consumption power) were selected. Two of them were redefined as journal measures of affinity (the proportion of foreign authors), associativity (the amount of collaboration), and calculated as objective indicators of journal value. The affinity and associativity for JELIS calculated for 1984, 1994, and 2004, and consumption calculated for 1985 and 1994, show a holding pattern. The author concludes that journal value is multidimensional and citations do not capture all facets: costs, benefits, and measures for informative and scientific value must be distinguished and developed in a fuller model of journal value. Rowley and Urquhart present the first of a two-part article that establishes a model of the mediating factors that influence student information behavior concerning electronic or digital information sources that support their learning. This first article reviews the literature that supported the development of the research methodology for the Joint Information Systems Committee (JISC) User Behavior Monitoring and Evaluation Framework, as well as the literature that has subsequently helped to develop the model over the 5 years the Framework operated in the United Kingdom, in five cycles of research that were adjusted to meet the emerging needs of the JISC at the time. The literature review attempts to synthesize the two main perspectives in the research studies: (a) smallscale studies of student information behavior and (b) the studies that focus on the quantitative usage of particular electronic information services in universities, often including implications for training and support. The review indicates that there are gaps in the evidence concerning the browsing and selection strategies of undergraduate students and the interaction of some of the mediating influences on information behavior. The Framework developed a multimethod, qualitative and quantitative methodology for the continued monitoring of user behavior. This article discusses the methods used and the project management challenges involved, and concludes that, at the outset, intended impacts need to be specified carefully, and that funding needs to be committed at that point for a longitudinal study. Gil-Leiva and Alonso-Arroyo analyze the keywords given by authors of scientific articles and the descriptors assigned to the articles to ascertain the presence of the keywords in the descriptors. Six hundred and forty INSPEC (Information Service for Physics, Engineering, and Computing), CAB (Current Agricultural Bibliography), ISTA (Information Science and Technology Abstracts), and LISA (Library and Information Science Abstracts) database records were examined. It was found that keywords provided by authors have an important presence in the database descriptors studied: nearly 25% of all the keywords appeared in exactly the same form as descriptors, with another 21%, though normalized, still detected in the descriptors. Urquhart and Rowley present the second of a two-part article that establishes a model of the mediating factors that influence student information behavior concerning electronic or digital information sources that support their learning. The authors discuss the findings of the Joint Information Systems Committee Framework (1999–2004) and development of a model that includes both the individual (micro) and organizational (macro) factors affecting student information behavior. The macro factors are information resource design, information and learning technology infrastructure, availability and constraints to access, policies and funding, and organizational leadership and culture. The micro factors are information literacy, academics' information behavior, search strategies, discipline and curriculum, support and training, and pedagogy. The authors conclude that the mediating factors interact in unexpected ways and that further research is needed to clarify those interactions, especially between the macro and micro factors. Nicholson and Smith examine the impact of the Health Insurance Portability and Accountability Act (HIPAA), which is designed to provide those handling personal health information with standardized, definitive instructions as to the protection of data. The authors discuss the present situation of privacy policies about library use data, outline the HIPAA guidelines to understand parallels between the two, and propose methods to create a de-identified library data warehouse based on HIPAA for the protection of user privacy. Cathey et al. propose a distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard TREC test collection, the parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that the algorithm performs close to the expected O( n2/ p) time on p processors rather than the worst-case O( n3/ p) time. Furthermore, the O( n2/ p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, the results confirm those of other studies that have shown that hierarchical algorithms produce significantly tighter clusters in the document clustering task. The authors demonstrate how the parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call