Abstract
Recently, a fundamental study on measurement of data quality introduced an ordinal-scaled procedure of measurement. Besides the pure ordinal information about the level of quality, numerical information is induced when considering uncertainty involved during measurement. In the case where uncertainty is modelled as probability, this numerical information is ratio-scaled. An essential property of the mentioned approach is that the application of a measure on a large collection of data can be represented efficiently in the sense that (i) the representation has a low storage complexity and (ii) it can be updated incrementally when new data are observed. However, this property only holds when the evaluation of predicates is clear and does not deal with uncertainty. For some dimensions of quality, this assumption is far too strong and uncertainty comes into play almost naturally. In this paper, we investigate how the presence of uncertainty influences the efficiency of a measurement procedure. Hereby, we focus specifically on the case where uncertainty is caused by insufficient information and is thus modelled by means of possibility theory. It is shown that the amount of data that reaches a certain level of quality, can be summarized as a possibility distribution over the set of natural numbers. We investigate an approximation of this distribution that has a controllable loss of information, allows for incremental updates and exhibits a low space complexity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.