Abstract

Software plagiarism is a growing and serious problem that affects computer science universities in particular and the quality of education in general. More and more students tend to copy their thesis 's software from older theses or internet databases. Checking source codes manually, to detect if they are similar or the same, is a laborious and time consuming job, maybe even impossible due to existence of large digital repositories. Ontology is a way of describing a document's semantic, so it can be easily usedfor source code files too. OWL Web Ontology Language could find its applicability in describing both vocabulary and taxonomy of a programming language source code. SPARQL is a query language based on SQL that extracts saved or deducted information from ontologies. Our paper proposes a source code plagiarism detection method, based on ontologies created using Protege editor, which can be applied in scanning students ' theses 'software source code.Keywords: Ontology, OWL, SPARQL, Plagiarism, Protege(ProQuest: ... denotes formulae omitted.)1 IntroductionIn our days we have a huge volume of digital information, thing that can be very useful on one side, but a disadvantage on the other. The useful part is that we can find any needed information more quickly (at a click of a button as we usually say) than in the past by taking advantage of the digital repositories. The disadvantage is that finding similar or duplicated documents is very difficult now, especially when this job is made manually. That is why we try to find alternative solutions in the field of plagiarism detection systems [1],The term is inherited from philosophy where it refers to existence and the things that exist. In computer science those things are represented by data and the ontology generally describes the semantic of terms used in a specific domain (in our case programming), providing a vocabulary for that domain as well as a computerized specification of the meaning of terms used in the vocabulary. Ontologies range from taxonomies and classifications, database schemas, to fully axiomatized theories. In recent years, ontologies have been adopted in many business and scientific communities as a way to share, reuse and process domain knowledge. Ontologies are now central to many applications such as scientific knowledge portals, information management and integration systems, electronic commerce, and semantic web services [2], In our work we will use ontologies for building the knowledge graph specific to each source code that we suspect of plagiarism.OWL Web Ontology Language is a specification by the World Wide Web Consortium (W3C) and serves as a fundamental component of the Semantic Web initiative. OWL is based upon the Extensible Markup Language (XML), XML Schema [3], the Resource Description Framework (RDF) and RDF Schema (RDF-S) [4], It is composed from three sublanguages OWL- Lite, OWL-DL and OWL-Full, from those OWL-DL being the one most often used because it provides maximum expressiveness.The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about web resources, such as the title, author, and modification date of a web page, copyright and licensing information about a web document, or the availability schedule for some shared resource [4], However, by generalizing the concept of a web resource, RDF can also be used to represent information about things that can be identified on the web, even when they cannot be directly retrieved on the web.RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. …

Highlights

  • Be identified on the web, even when they cannot be directly retrieved on the web

  • An ontology consists of a set of classes organized in a subsumption hierarchy to represent a domain's salient concepts, a set of slots associated to classes to describe their properties and relationships, and a set of instances of those classes individual exemplars of the concepts that hold specific values for their properties; the Protégé-OWL editor enables users to build ontologies for the Semantic Web, in particular in the W3C's Web Ontology Language (OWL)

  • 4 Conclusions In this paper it was shown that ontologies can be used in detecting source code plagiarism

Read more

Summary

Introduction

Be identified on the web, even when they cannot be directly retrieved on the web. RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. We will use RDF and OWL in our method as standards and formats for saving the ontologies created via the Protégé editor. We prefer this approach because they are W3C standards and in this way we can provide interoperability between our work and other future related works. The results of SPARQL queries can be results sets or RDF graphs

Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call