The 2007 implementation of the Office Open XML standard for Microsoft Word introduced the assignation of individual revision save identifiers (Rsid) to document editing sessions that end in a save action. The relevant standards ECMA (2016) and ISO/ IEC 29500-1:2016 (2016) stipulate that these Rsid should be allocated randomised but with increasing numerical value, thereby documenting the progress of the editing. As MS Word is the most ubiquitous word processing software, Rsid appear to be a useful tool to examine and provide evidence for a wide range of common document generation editing and modification processes and file management operations, with implications for document analysis including, but not limited to academic integrity issues in student assignment submissions (e.g. contract cheating). This paper presents the results of a series of experiments conducted to assess whether and how well MS Word implements the ECMA and ISO/ IEC standards. The results show that the number of allocated Rsid indeed increases with each edit and save action, with the previous Rsids carried over and retained. The newly allocated Rsid, however, do not conform to the standard as the numerical value of a Rsid associated with a save action may be larger or smaller than any or all of those allocated during that of the previous save actions. The allocation of a new Rsid is not necessarily caused by an edit event but that a new Rsid can also be generated if a file is saved as rtf or if it is sent as an e-mail from within MS Word, although the file was not edited in any way. Rsid numbers are not generated if a person opens a MS Word document, reads it and closes the file without saving, making this action impossible to detect. MS Word template files on a given machine contain document (root) Rsid numbers that are generated when a newly installed application is launched for the first time. As these will be embedded as legacy Rsid into every new file generated from that template file, they act as signatures for all MS Word documents that are created. The experiments have shown that user behaviour has a direct influence on the number of Rsid represented in a given file. Although the implementation of Office Open XML chosen by Microsoft is not compliant with the relevant standards, and thus Rsid cannot be used determine the exact chronological order of all editing sequences within a given document, the Rsid retain their value for document forensics as they are associated with specific edit events, and illuminate the document writing and editing process.
Read full abstract