Large state spaces and large data: Utilizing neural network ensembles in reinforcement learning and kernel methods for clustering

Stefan Faußer

doi:10.18725/oparu-3241

Abstract

With reinforcement learning techniques, an agent learns an optimal policy by trial-and-error interaction with an environment. The integration of function approximation methods into reinforcement learning models allows for learning state-action values in large state spaces. Ensemble models can achieve more accurate and robust predictions than single learners. In this work, reinforcement learning ensembles are considered, where the members are artificial neural networks. It is analytically shown that the committees benefit from the diversity on the value estimations. The empirical evaluations on two large state space environments confirmed the theoretical results. A selective ensemble may further improve the predictions by selecting a subset of the models from the entire ensemble. In the thesis, an algorithm for ensemble subset selection is proposed. Experimentally, we found that selecting an informative subset of many agents may be more efficient than training full ensembles. In clustering, a model is built for discovering group-like structures in unobserved data. Over the last years, real-world data sets have become larger. However, an exact-solution model training method may not be able to learn from full large data sets due to the time complexity. Partitioning clustering methods with a linear time complexity can handle large data sets but mostly assume spherically-shaped clusters in the input space. In contrast, kernel-based methods may group the data in arbitrary shapes in the input space, but have a quadratic time complexity. This work focuses on an approximate kernel clustering approach and empirically evaluates it on five real-world data sets. In semi-supervised clustering, external information is partially used for improving the clustering results. A method (SKC) is proposed that exploits the class labels to influence the positions in the centres. In the experiments, SKC outperformed the baseline methods in the external cluster validation measures.

Full Text