Abstract

Stylochronometry deals with the influence of time in an author's style, specifically how it changes stylometric features. Analysis of time drift occurrence is important especially for a dataset creation process of other works in this area. In this paper, we performed experiments using the Google Code Jam dataset to show the influence of time drift in the area of source code authorship attribution. Our experiments revealed that there is significant time drift in stylometric features in one year difference, which is enlargening as the difference of time increases. Another interesting result is that when training our authorship attribution method on data from the future and testing on data from the past, the time drift is lower than in opposite direction. Also, we found the relation between the length of source code and the accuracy of our authorship attribution method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call