<p class="p1">Recent developments in technology have made it easier to produce digital con- tent, especially textual articles. But, it has a negative impact in the form of a rising public skepticism of digital data due to plagiarism. Indonesia, one of the world’s most populous countries, is not resistant to this problem. To resolve it, the authorship attribution (AA) task must be executed. However, there has been little investigation on AA for Indonesian articles. As a result, this research applies the AA task to an Indonesian digital news articles dataset. Continuing the previous research, dataset modification was carried out to increase data com- plexity by adding a new class, namely the author’s gender, and also by balancing the distribution of data versus labels to minimize potential overfitting, and model hyper-parameter configurations were carried out to enhance the results gained. This research successfully applied the IndoBERT model to the Indonesian AA task, yielding results in the form of precision = 0.92, recall = 0.90, and F1-score = 0.91. These results indicate that the Indonesian AA task has a lot of potential for development since it identifies writing patterns that may benefit the forensic field, detect plagiarism, and analyze Indonesian texts.</p>
Read full abstract