Abstract

s Service. This is issued biweekly, and includes the titles, authors' names, and bibliographic references of currently published articles of chemical interest. The issue used was No. 1, 1971, dated 11 January. A typical entry from the issue is shown in Fig. 1; the bibliographic reference is given as the ASTM Coden. The titles are recorded in upper-case characters. An occasional artefact arises through the insertion of additional space symbols; the printed publication includes a KWIC (Key Word In Context) index, and the spaces ensure that certain chemical word stems such as QUINONE in Fig. 1 (the word is normally written as PHYLLOQUINONE) are indexed. A set of simple programs (written in PLAN, the ICL 1900 series assembly language) was devised to produce counts of n-grams (i.e., strings of 1, 2, 3 and 5 characters), including the space character, for values of n between 1 and 5. The program to count single character occurrences used the binary value of the character code to address a position in a 62-word array. The digrams were counted by using a two-dimensional array (62 x 62 = 3844). Longer /j-grams (« = 3 and 5) were created by taking a window equal to that number of characters and moving it along the title record, creating a new record at each position (a space was inserted as the initial character of each title). The records were written to tape, and subsequently sorted, counted and printed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.