The issue of dirty data, particularly duplicate data, is a common problem in data management that can affect data quality, operational efficiency, and decision-making. This study highlights the importance of implementing sustainable deduplication strategies as a key step in managing dirty data. We explore solutions for detecting duplicate data by measuring text similarity indices. In this study, the authors utilize a literature review research method. Through this method, we collected various journals on data deduplication and text similarity techniques, comparing several methods to identify the most effective approach. The data deduplication process in this study consists of two stages: 1) Matching - calculating the similarity value of a record with previous records, and 2) Clustering - grouping all records deemed duplicates of a single entity. Furthermore, this study extends to the development of a Python application capable of identifying and grouping similar customer data based on text similarity values. The Text Similarity measurement method uses the Jaro-Winkler Similarity technique. Experimental results and evaluations show that the Text Similarity approach is effective in identifying duplicate data with a high degree of accuracy. This study emphasizes the importance of sustainable deduplication, where the deduplication process is conducted periodically and continuously to ensure optimal data quality.