A framework for specifying explicit bias for revision of approximate information extraction rules

Ronen Feldman,Binyamin Rosenfeld,Jonathan Schler,Jonathan Stoppi,Yair Liberzon

doi:10.1145/347090.347125

Abstract

Information extraction is one of the most important techniques used in Text Mining. One of the main problems in building information extraction (IE) systems is that the knowledge elicited from domain experts tends to be only approximately correct. In addition, the knowledge acquisition phase for building IE rules usually takes a tremendous amount of time on the part of the expert and of the linguist creating the rules. We therefore need an effective means of revising our IE rules whenever we discover such an inaccuracy. The IE revision problem is how best to go about revising a deficient IE rules using information contained in examples that expose inaccuracies. The revision process is very sensitive to implicit and explicit biases encoded in the specific revision algorithm employed. In a sense, each revision algorithm must provide two forms of biases: bias as to the place of the revision and bias as to the type of the revision that should be performed. In this paper we present a framework for writing approximate IE rules that are provided with explicit bias. The proposed framework can be used by many existing revision algorithms. The purpose of the revision bias framework is to allow the user to declare his own bias in a simple and structured way, i.e. to express the conditions placed on the domain knowledge for a given revision operator to be applied. This language extends and generalizes the work reported in [Feldman et. al. 1993]. It attacks the problem of writing IE rules from a novel perspective, one which enables a much faster development of IE systems.

Full Text