Land artificialization is a significant modern concern, as it is irreversible, diminishes agriculturally suitable land and causes environmental problems. Our project, Hérelles, aims to address this challenge by developing a framework for land artificialization management. In this framework, we associate urban planning rules in text form with clusters extracted from time series of satellite images. To achieve this, it is crucial to understand the planning rules with two key objectives: (1) to verify if the constraints derived from the rules are verifiable on satellite images and (2) to use these constraints to guide the labelling (or semantization) of clusters. The first step in this process involves the automatic extraction of rules from urban planning documents written in the French language. To solve this problem, we propose a method based on the multilabel classification of textual segments and their subsequent summarization. This method includes a special format for representing segments, in which each segment has a title and a subtitle. We then propose a cascade approach to address the hierarchy of class labels. Additionally, we develop several text augmentation techniques for French texts that can improve prediction results. Finally, we reformulate classified segments into concise text portions containing necessary elements for expert rule construction. We adapt an approach based on Meaning Representation (AMR) graphs to generate these portions in the French language and conduct a comparative analysis with ChatGPT. We experimentally demonstrate that the resulting framework correctly classifies each type of segment with more than 90% accuracy. Furthermore, our results indicate that ChatGPT outperforms the AMR-based approach, leading to a discussion of the advantages and limitations of both methods.
Read full abstract