A dynamic classification problem in which a change with time in the data classified is assumed is considered. Data streams, such as computer network data, sensor data, bank transactions, etc., are characterized by problems of data drift, the emergence of new classes, and anomalies. The existing data streams classification methods are analyzed. It is pointed out that there is no a single and effective classification method that would simultaneously take into account the problems of anomaly detection, drift, and model adaptation to new data. The importance of controlling the decision-making area of classifiers for obtaining a high-quality solution of the problem is noted. A dynamic classification method based on a scalable ensemble of autoencoders with a decision-making area controlled using the EDCAP criterion is proposed. The autoencoder properties are used to solve the problems of detecting drift, anomalies and new classes. The ensemble autoencoders were trained to recognize a single class. Based on the EDCAP criterion, the size of the recognition area of each autoencoder was controlled. The classification result is based on analyzing the responses of the ensemble's all autoencoders. When a new class of data is detected, the ensemble is scaled by adding a new autoencoder. When a drift is detected, only the corresponding autoencoders are retrained. The qualities of the proposed dynamic classifier and an incremental algorithm based on an adaptive Hoeffding tree are compared. The advantages of the proposed method are demonstrated on the example of a synthetic data stream that includes drift, a new class, and anomalies.
Read full abstract