ABSTRACTKnowledge‐driven GIS increasingly requires multi‐source, multi‐type, and multi‐model crowd‐sensing spatiotemporal data, whose data quality is difficult to guarantee and determine. Hence, extracting quality indicator information, widely present in various unstructured web texts, is crucial to providing supplementary quality information for crowd‐sensing spatiotemporal data. Recent advances in large language models show potential in extracting quality indicator information. However, it is still hard to get accurate results from large language models that use different quality indicators for crowd‐sensing spatiotemporal data. Therefore, we have designed a large language model that is fine‐tuned for the extraction of spatiotemporal quality information from quality description text (LLMFT‐STQIE). Firstly, we establish a quality indicator vocabulary to determine whether the text includes quality indicator information from the spatiotemporal data. Then, we create a two‐stage prompt model with QILE and QIVE prompts that include input text, task type, instructions, the quality indicator vocabulary, output format, and a reference case. This model is based on the fine‐tuning technology of large language models. The results show that our LLMFT‐STQIE achieves an accuracy of 91% and a recall rate of 80%, respectively, representing improvements of 23% and 38% compared to untuned large language models. These results further show that the suggested method easily and accurately extracts quality indicator information from web texts for crowd‐sensing spatiotemporal data. The study helps investigate strategies for optimizing huge language models for specific scenarios or task specifications.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access