Abstract
Provenance is becoming an important issue as a reliable estimator of data quality. However, provenance collection mechanisms in the reservoir engineering domain often result in incomplete provenance information. In this paper, we address the problem of predicting missing provenance information in reservoir engineering. Based on the observation that data items with specific semantic “connections” may share the same provenance, our approach annotates data items with domain entities defined in a domain ontology, and represent these “connections” as sequences of relationships (also known as semantic associations) in the ontology graph. By analyzing annotated historical datasets with complete provenance information, we capture semantic associations that may imply identical provenance. A statistical analysis is applied to assign probability values to the discovered associations, which indicate the confidence of each association when it is used for future provenance prediction. We develop a voting algorithm which utilizes the semantic associations and their confidence measures to predict the missing provenance information. Because the existing provenance information can be incorrect due to errors during the manual provenance annotation procedure, as an extension of the voting algorithm, we further design an algorithm for prediction which takes into account both the confidence measures of semantic associations and the accuracy of the existing provenance. A probability value is calculated as the trust of each prediction result. We develop the ProPSA (Provenance Prediction based on Semantic Associations) system which uses our proposed approaches to handle incomplete and inaccurate provenance information in reservoir engineering. Our evaluation shows that the average precision of our approach is above 85% when one-third of the provenance information is missing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.