• Clustering log data is fundamental for providing insights into many critical areas. • Feedback such as labels and pairwise constraints can improve log clustering quality. • A principled way to summarize the data can be used to deal with a high volume of data. Machine-generated log data can provide valuable insights into many critical areas such as system failures, network security, and performance optimization. The increasing prominence of this data in both volume and complexity requires data mining approaches that are both scalable and flexible. In this paper, we propose a new approach for clustering machine-generated logs which contains a novel combination of the use of the coreset with user feedback. The coreset allows us to efficiently summarize the data in a principled manner such that performance after fitting model parameters on the coreset is similar to the performance that would have been achieved with the full dataset. Furthermore, the formal approach we propose allows users to incorporate two different types of feedback, in the forms of labels and pairwise constraints, to further improve results and better deal with the increasing complexity and variety of log datasets.