Abstract

Feature transformation for AI is an essential task to boost the effectiveness and interpretability of machine learning (ML). Feature transformation aims to transform original data to identify an optimal feature space that enhances the performances of a downstream ML model. Existing studies either combines preprocessing, feature selection, and generation skills to empirically transform data, or automate feature transformation by machine intelligence, such as reinforcement learning. However, existing studies suffer from: 1) high-dimensional non-discriminative feature space; 2) inability to represent complex situational states; 3) inefficiency in integrating local and global feature information. To fill the research gap, we propose a novel group-wise cascading actor-critic perspective to develop the AI construct of automated feature transformation. Specifically, we formulate the feature transformation task as an iterative, nested process of feature generation and selection, where feature generation is to generate and add new features based on original features, and feature selection is to remove redundant features to control the size of feature space. Our proposed framework has three technical aims: 1) efficient generation; 2) effective policy learning; 3) accurate state perception. For an efficient generation, we develop a tailored feature clustering algorithm and accelerate generation by feature group-group crossing based generation. For effective policy learning, we propose a cascading actor-critic learning strategy to learn state-passing agents to select candidate feature groups and operations for fast feature generation. Such a strategy can effectively learn policies when the original feature size is large, along with exponentially growing feature generation action space, in which classic Q-value estimation methods fail. For accurate state perception of feature space, we develop a state comprehension method considering not only pointwise feature information but also pairwise feature-feature correlations. Finally, we present extensive experiments and case studies to illustrate 24.7% improvements in F1 scores compared with SOTAs and robustness in high-dimensional data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call