Tree construction is one of the popular methods for tackling any supervised task in machine learning. However, there has been little effort in applying trees for unsupervised tasks. The traditional unsupervised trees are based on recursively partitioning the space such that the achieved partitions contain similar samples. Sense of similarity depends on the models and applications. This paper tackles the issue of learning optimal clustering oblique trees for the first time and proposes a linear time algorithm for training it. Optimizing performance of infrastructures and energy consumption in the field of Internet of things can be mentioned as applications of tree and clustering, respectively. The motivation of unsupervised tree models is to preserve the data manifold, while keeping the query, while keeping the query time fast. Popular unsupervised models consist of k-d trees, random projection (RP trees), principal component analysis trees (PCA trees) and clustering trees. However, all existing methods for unsupervised tree are sub-optimal. Additionally, existing clustering trees are limited to axis-aligned trees. Further, some of the mentioned methods suffer from curse of dimensionality such as k-d trees. Despite the mentioned challenges, trees are fast in query time. On the other hand, a non-hierarchical clustering such as k-means has both: It performs well in high-dimensional problems and is locally optimal. Its learning algorithm is efficient. However, k-means clustering is not fast in query time. To address the mentioned issues, this paper proposes a novel k-means tree, a tree that outputs the centroids of clusters. The advantages of such tree are being fast in query time and also learning as good cluster centroids as k-means. As a result, problem of learning such trees is to learn both centroids and the tree parameters optimally and jointly. In this paper, this problem is first cast as a constrained minimization problem and then solved using quadratic penalty method. The method consists of learning clusters from k-means and gradually adapting centroids to the outputs of an optimal oblique tree. The alternating optimization is used, and alternation steps consist of weighted k-means clustering and tree optimization. Additionally, the training complexity of proposed algorithm is efficient. Proposed algorithm is optimal in the sense of learned clusters and tree jointly. Trees used in the k-means tree are oblique, and as per our knowledge, this is the first time that oblique trees are applied to the task of clustering. As a side product of the proposed method, sample reduction is explored and shown its merits. It is shown that computational complexity of training KMT (K-means tree) as a sample reduction method is faster than training K-means as a sample reduction. The training complexity of KMT sample reduction algorithm is logarithmic over the size of reduced train set, while training complexity of K-means is linear over the size of reduced dataset. Finally, proposed method is compared to other tree-based clustering algorithms and its superiority in terms of reconstruction error is shown. Additionally, its query complexity is compared with k-means.