Abstract
The application to taxonomic studies of various recently introduced numerical methods requires the use of computers if significant amounts of data are to be analysed. Computer programs for the following numerical taxonomic methods were therefore developed for a GE 225 computer in a symbolic language, GAP : (1) Single-Linkage Cluster Analysis (Sneath (1957 b)) (2) Average-Linkage Cluster Analysis (based on the unweighted variablegroup method of Sokal and Michener (1958)) (3) Complete-Linkage Cluster Analysis (a modification of the Single- Linkage method not previously used in Bacteriology) (4) a new Complete-Linkage Cluster Analysis (5) Similarity Matrix Calculation, and Printout in triangular format (6) Intra-Group and Inter-Group Similarity Estimation (Sneath (1957 b)) (7) Median Organism Estimation (Listen et al. (1963) - based on positive characters only) (8) Median Organism Estimation (based on both positive and negative characters) (9) Comparison of Median Organism with ungrouped strains. The above programs can all handle up to 217 strains, with up to 200 characters each, except for program 8, which can only handle 200 strains. Some of the programs (numbers 4, 5, 6, 7, and 9) can handle more than 217 strains. The limit on the number of strains which can be analysed is imposed by the size of the computer memory. The strain handling capacities of the programs could be expanded by storing the strain data, which at present is being stored in the memory, on to magnetic tape, or on to magnetic discs, when these become available. A novel way of storing the strain data, which makes use of all the bits in any particular memory location, is employed in all the programs. This means of storage provides a ten-fold reduction as compared to the conventional method, in the amount of computer storage space needed and in the time taken to compare strains. A comparative study of Single-Linkage, Average-Linkage, and Complete- Linkage Cluster Analyses was made, using these programs and the data of 121 strains of legume root-nodule bacteria examined over 100 characters (supplied by Dr. P. Graham, and published in Graham C1964)l, and of 120 strains of enterobacteria examined over 77 characters Cunpublished data supplied by Dr. W.H. Ewing). The end results produced by the three methods were, broadly speaking, similar. However, there was a marked difference between the methods in the fineness of the clustering produced, and in the analysis times. The Single-Linkage method produced a result most quickly of the three (e.g. in about 20 minutes for 121 strains with 100 ctiaracters each), but the differentiation into groups thus obtained was inferior to that produced by either Average- or Complete-Linkage Analysis. The latter two methods produced virtually identical groupings, but at different clustering levels (as is to be expected). However, the time required by the Average-Linkage method to achieve the analysis was considerably greater than that required by the Complete-Linkage method (some 3 hours as compared to about 50 minutes, for 121 strains with 100 characters each). It is therefore suggested that the Complete-Linkage Cluster Analysis is more useful than either the Single- or the Average- Linkage method, and should be used in preference to these methods in numerical taxonomic studies. The problem of how to cope with the analysis of large numbers of organisms was also investigated. When more than about 300 organisms were to be clustered, the use of even the above Complete-Linkage method became impractical because of the amount of computer time involved. A new Complete- Linkage method was therefore developed, which also forms clusters around highly related organisms, as do the other methods, but which is much less time-consuming, because it does hot require the calculation of the whole similarity matrix before clustering can begin. Analyses with this method of the root-nodule bacteria and the enterobacteria data produced virtually the same groupings as had been obtained by the original Complete-Linkage method, and in a much shorter time. It must, however, be mentioned that a dendrogram cannot be drawn from the results of this method. Nevertheless, it is suggested that if a large collection of organisms is to be analysed, this new Complete-Linkage method could be used to make the preliminary grouping into major clusters, and that a more detailed analysis of these clusters should then be made with the original Complete-Linkage method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.