Typology of Ambiguity on Representation of Information Needs

Yang-Woo Kim

doi:10.5860/rusq.53n4.313

Abstract

Disambiguating human inquiries, either in a semantic or lexical approach, is an essential process to consider in developing information systems and services. This paper discusses this process for design in two related domains--information systems and services--but in a specific aspect of such domains--accommodating different types of full-sentence questions. The information system domain attempts to refine question categorization to develop question-answering (QA) systems. While significant work has been done in this area, consideration of question ambiguity has been limited on classifying questions. This paper presents a classification of a set of full-sentence questions originally collected for the Text REtrieval Conference (TREC) 8 and 9 Question Answering (QA) Tracks, according to their ambiguity which could mislead an engaged information system. (2) The information service domain concerns situations in which prospective users are engaged in the searching activity with the information needs represented in the question set. The discussion then extends into the possible intervention of a human information intermediary (i.e., a reference librarian) in the searching process. On the basis of the types and dimensions of ambiguity identified, three aspects of information systems and services are discussed mainly related to user-system and user-information intermediary interactions. Those three aspects are (1) increasing user input to make initial queries less ambiguous, (2) reducing search space by disambiguating queries, and (3) clustering search results on the basis of characteristics of prospective answers. Unlike the majority of question analyses conducted on the previous work (reviewed in this paper), this study does not aim to categorize questions according to plausible inference, anticipating a single answer to a question. Instead, users' query statements are classified on the basis of what the author did not explicitly know about the inquirers' intentions. This approach seems reasonable because what is manifestly known of an inquirer's intention from a single sentience query is quite limited. In addition, the increase in fact-finding questions in the digital environment provides significance for this specific study while the relevant literature indicates an increase in the virtual reference questions compared to the decrease in traditional reference questions. (3) This paper, therefore, addresses the following research questions: * What are the different types of ambiguity in a set of questions, originally collected for TREC 8 and 9 QA Tracks? * What are the implications of the ambiguities identified for user-system and user-information intermediary (i.e., a reference librarian) interactions? BACKGROUND Researchers have attempted to categorize questions (or user needs) with varying approaches from related fields. The review of relevant literature indicates little consideration of sentence ambiguity, particularly in categorizing an exhaustive set of questions. Internal Need vs. Expressed Need Many studies discussed possible discrepancies between people's internal needs and expressed needs. Taylor suggested the need to accommodate the users' hidden needs; he presented four different types of user needs as levels of question formation: visceral, conscious, formalized, and compromised. (4) Several authors further developed Taylor's ideas, emphasizing the need to cope with the discrepancies between the internal (visceral, conscious, formalized) and the expressed (compromised) needs. Ingwersen emphasized the importance of identifying the relation between the formalized need and the compromised need. The compromised need (the question as presented to librarian or system) is an expressed need. When there are discrepancies between the internal and the expressed needs, there seem to be stronger possibilities of ambiguity in users' questions. …

Full Text