Abstract

Abstract We define the similarity of bakery recipes using different distance calculations and identify groups of similar recipes using different clustering algorithms. Our analyses are based on the relative amounts of ingredients included in the recipes. We compare different clustering algorithms (k-means, k-medoid, and hierarchical clustering) to find the optimal number of clusters. Besides the standard distance calculation (euclidean distance), we test three other distance metrics (hamming distance, manhattan distance, and cosine similarity). Additionally, we reduce the impact of raw materials used in large quantities by applying two different data transformations, namely the logarithm of the original data and the binarization of the original data. Clustering recipes based on their ingredients can improve the search for similar recipes and therefore help with the time-consuming process of developing new recipes. Using the hierarchical clustering on the logarithm of the original data, we can separate 704 recipes into three different clusters, achieving a Silhouette Score of 0.531. We visualize our results via dendrograms representing the recipes’ hierarchical separation into individual groups and sub-groups.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call