Abstract

In many real applications, the data are always collected from different information sources and are subject to evolve over time. Such data are referred to as dynamic multi-source data. How to efficiently select the informative features from dynamic multi-source data is a challenging problem in data mining. Incremental feature selection with rough sets is an effective method to select features from dynamic data. However, existing methods focus on single-source data and are not suitable for dynamic multi-source data with variations in data sources. To deal with this issue, we present an incremental feature selection method based on the matrix representation of the conditional entropy. We first propose a novel conditional entropy for multi-source data and discuss its properties, including the monotonicity and boundedness. Then, matrix characterization of the conditional entropy is presented by employing the condition and decision relation matrices associated with some matrix operators. Finally, considering the addition and deletion of data sources in multi-source data, we employ the matrix approach to investigate the incremental mechanisms for the computation of the conditional entropy and develop the corresponding incremental feature selection algorithms. Extensive comparative experimental results are obtained to verify the effectiveness and efficiency of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call