A large petrochemical construction project is typically executed by multiple parties, all bound by contract agreement. During the execution phase, issues and problems may arise because the work details are not clearly specified in the contractual agreement. These issues are formally communicated and documented through written correspondence letters. By identifying important keywords within these formal letters, a comprehensive narrative of the project, including its associated issues, can be identified and analyzed. In this research, we introduce an adjusted TextRank algorithm that integrates external features from the Indonesian FastText language model and term frequency-inverse document frequency (TF-IDF) scores to identify important keywords within a dataset of correspondence letters of petrochemical projects. This enhancement involves refining phrase detection, semantic relationship estimation between words, and part-of-speech (POS) identification for words or phrases. Our results show that the proposed adjustments result in improved evaluation scores compared to the baseline standard TextRank and standard TF-IDF, respectively by 24.1% and 25% in terms of F-1 scores.
Read full abstract