A class and method taxonomy for object-oriented programs

David A Workman

doi:10.1145/511152.511161

Abstract

The object-oriented approach to software design together with the programming languages (C++, Java, and Ada95) and design notations (e.g. UML) that support this paradigm, have precipitated new interest in developing and tailoring software metrics to more effectively quantify properties of OO systems. To be specific, this research on OO software is motivated by two related problems. 1) In many computer science courses instructors are torn between two conflicting goals: (a) increasing the number and difficulty of programming assignments to raise students' problem solving skills and maturity, while on the other hand, (b) giving meaningful feedback on the correctness and quality of programs they write. To address this problem, we are developing an automated Java program grading system. This system will compare student programs to an oracle program prepared by the instructor for a given assignment. The oracle program represents the "ideal" solution. In addition to computing a quantitative score for a student program, the grading program will also provide feedback on modifications or changes the student could or should make to improve the quality of the design of his or her solution. 2)A problem that is all too common in the computing industry is software theft. This has led to much copyright infringement litigation within our court system. As an expert witness in such cases, one of the tasks I have been frequently asked to perform is evaluate two programs to determine the nature and extent of their similarity. A tool, such as our planned program grading system, is needed to facilitate the kind of analysis required in such cases. In the academic world, the equivalent to software theft is plagiarism. Therefore, as an application complementary to program grading, our proposed system will also serve as a tool for identifying "cheaters" by comparing two student programs to one another, rather than to the oracle. So, in summary, our goal is to develop the key algorithms and eventually a program analysis system that will effectively determine the similarity of two programs written in the same language. Since Java is becoming one of the most widely used programming languages, and because of its relatively "clean" syntax and semantics, Java will provide the focus for our initial investigation. Java programs are composed of three essential building blocks: packages, classes, and methods. Methods are the functional or procedural units that perform or realize the algorithms necessary to solve a computational problem. Methods are grouped with encapsulated data to define classes -new types that extend Java's set of primitive types. Finally, classes are organized into subsystems or libraries using packages. Thus, when comparing two Java programs to determine their similarity, we must establish a correspondence between the packages, classes, and methods of the two programs under consideration. This suggests we must ascertain for a given pair of units, one from each program whether or not they are sufficiently similar to warrant being identified as "matching" in our correspondence analysis. To be similar, they must be "doing the essentially the same thing" -that is, they must both serve the same computational purpose. Assuming we are successful in developing some technique for determining similarity of purpose, we are still faced with the potentially large numbers of unit-pairs that must be considered in our analysis. The sheer magnitude of our computational problem thus looms as a major obstacle to obtaining any real practical solution. Using the names of units to limit what pairs need to be compared, while certainly reducing the potential computational load, is not a very reliable strategy --- particularly if the author of one program has made a deliberate attempt to disguise similarity with another program by uniformly changing names. Thus, in an attempt to address the computational load problem and the identification problem for comparison analysis, we plan to make an initial pass over each program to categorize methods and classes according to their purpose. The rationale for this is: two units will be selected for detailed comparison analysis only if they belong to of the same purpose category. The focus of this paper, therefore, is to present definitions and examples of the purpose categories for methods and classes. How these purpose categories will be used in a larger comparison strategy is beyond the scope of this work. Refer to Lan[13] for further a more complete and detailed description of our methodology.

Full Text