Multisource incomplete mixed data fusion (MsIMDF) plays a crucial role in outlier detection by utilizing complementary, informative, interpretative, and less noisy single-source data to identify unexpected errors or behaviors. However, existing multisource data fusion approaches only consider homogeneous data and single uncertain information,overlooking the mixed heterogeneous data and diverse uncertainty information. This limitation can negatively impact the performance of outlier detection. To address this issue, we propose MsIMDF-USF, a novel two-stage model that fully leverages the rich multisource knowledge and uncertainty information in incomplete mixed data. During the information fusion stage, the MsIMDF model combines multisource data into new single-source data using the minimum uncertainty strategy based on rough and fuzzy information. Subsequently, in the outlier detection stage, we reconstruct a neighborhood information network under a united-similar-fuzzy (USF) relationship using the new fused data. This reconstruction aims to strengthen the connections between similar objects while weakening relationships among dissimilar ones by considering single-attribute and multi-attribute information. Outlier scores are obtained based on the stationary distribution of the reconstructed networks using a Markov random walk. Experimental results on 16 real datasets demonstrate that the MsIMDF-USF model effectively extracts higher-quality data, exhibiting high applicability and robustness in outlier detection tasks.
Read full abstract