Abstract

In the field of human–computer interaction (HCI), text entry methods can be evaluated through controlled user experiments or predictive modeling techniques. While the modeling approach requires a language model, the empirical approach necessitates representative text phrases for the experimental stimuli. In this context, finding a phrase set with the best language representativeness belongs to the class of optimization problems in which a solution is sought in a large search space. We propose a genetic algorithm (GA)-based method for extracting a target phrase set from the available text corpus, optimizing its language representativeness. Kullback–Leibler divergence is utilized to evaluate candidates, considering the digram probability distributions of both the source corpus and the target sample. The proposed method is highly customizable, outperforms typical random sampling, and exhibits language independence. The representative phrase sets generated by the proposed solution facilitate a more valid comparison of the results from different text entry studies. The open source implementation enables the easy customization of the GA-based sampling method, promotes its immediate utilization, and facilitates the reproducibility of this study. In addition, we provide heuristic guidelines for preparing the text entry experiments, which consider the experiment’s intended design and the phrase set to be generated with the proposed solution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call