Abstract

Determining the similarity of research data is not a simple task, as the formats can differ widely depending on the domain. Especially, since many formats are represented as binary files, the raw comparison of these will not yield good results. This makes it hard to accurately tell how similar certain research work is by comparing the data. With the emergence of extracted interoperable metadata, a form to describe data has been provided which is independent of the data format. Therefore, this work tries to use this extracted interoperable metadata and create a method to determine the similarity of research data based on their metadata. The produced method utilizes domain knowledge about the extracted metadata and the way they are formulated. A baseline is created, and further methods are created to compare to. The results show that our method outperforms all other methods, especially the ones which are focused on comparing the research data itself, not the metadata. Since the results are promising, we propose further investigations against other datasets and possible use cases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.