AbstractChatGPT and other large language models (LLMs) have been successful at natural and computer language processing tasks with varying degrees of complexity. This brief communication summarizes the lessons learned from a series of investigations into its use for the complex text analysis task of research quality evaluation. In summary, ChatGPT is very good at understanding and carrying out complex text processing tasks in the sense of producing plausible responses with minimum input from the researcher. Nevertheless, its outputs require systematic testing to assess their value because they can be misleading. In contrast to simple tasks, the outputs from complex tasks are highly varied and better results can be obtained by repeating the prompts multiple times in different sessions and averaging the ChatGPT outputs. Varying ChatGPT's configuration parameters from their defaults does not seem to be useful, except for the length of the output requested.
Read full abstract