Abstract

Many writers of non-fictional texts engage intensively in exploratory web search scenarios during their background research on the essay topic. Though understanding such search behavior is necessary for the development of search engines that specifically support writing tasks, it has neither been systematically recorded nor analyzed. This paper contributes part of the missing research: We report on the outcomes of a large-scale corpus construction initiative to acquire detailed interaction logs of writers who were given a writing task on 150 pre-defined TREC topics. The corpus is freely available to foster research on exploratory search. Each essay is at least 5000 words long and comes with a chronological log of search queries, result clicks, web browsing trails, and fine-grained writing revisions that reflect the task completion status. To ensure reproducibility, a fully-fledged, static web search environment has been created on top of the ClueWeb09 corpus as part of our initiative.In this paper, we present initial analyses of the recorded search interaction logs and overview insights gained from them: (1) essay writing behavior corresponds to search patterns that are rather stable for the same writer, (2) fact-checking queries often conclude a writing task, (3) recurring anchor queries are often submitted to not lose the main themes or to explore new directions, (4) query terms can be learned while searching and reading, (5) the number of submitted queries is not a good indicator for task completion.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.