Abstract
Real world applications as sensor networks and RFID networks usually generate data with uncertainty. Data uncertainty comes from many sources, as measurement errors, limited precision, data aggregation and so on. Classical data mining applications need to be modified and extended for uncertain data; otherwise, their performances might be dramatically downgraded by data uncertainty. In this paper, we define an uncertain data model for both numerical and categorical uncertain data, and propose a new Expectation-Maximization based algorithm EMU for clustering uncertain data. This approach is well designed to find the distribution parameters that maximize model qualities based on uncertain data, therefore correctly identify the clusters. Our clustering algorithm can process both numeric and categorical uncertain data. In our experiments, we use both synthetic and real data sets to evaluate the effectiveness and robustness of the proposed algorithm.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.