Now that we have an understanding of agglomerative hierarchical clustering, let’s put it to practice using the same data we used for k-means: Age (years), Average table size purchases (square inches), the number of purchases per year, and the amount per purchase (dollars). One of the most common ways to plot hierarchical clustering results is via a tree diagram, or dendrogram.
The values on the left refer to the row numbers of the original data set (the values on the bottom refer to a measurement of distance). As you read from left to right, you can see the order in which clusters were merged together to create larger clusters. Comparing this algorithm to k-means clustering, we find that the results are similar. For example, the values at the bottom of the dendrogram, 19, 22, 21, 20, and 27, are grouped together – these are all of the customers who bought 2160 cm^2 tables that were similarly grouped in the k-means algorithm.
Showing the results of this clustering algorithm as a dendrogram reinforce the structural difference between this algorithm and k-means – each of the data points are nested together to create larger clusters, unlike k-means, which creates new non-overlapping clusters each iteration.
If you are interested in seeing the R code I used to run the agglomerative hierarchical clustering algorithm and create these plots, everything is are all available on our Data Driven Daily GitHub page.
I hope this week on clustering algorithms has been helpful. As you can tell, even though these are unsupervised classification techniques, there is still some human supervision and interpretation that is required, for example, to decide how many clusters should be used (and many other decisions, like how to initialize k-means or measurements of distance, which I encourage you to read more about). While clustering algorithms are generally can’t be used to tell you the “right” answer by just pushing a button, they are a great way to explore and understand your data!
 There are many ways to measure the distance between two clusters. For example, it could be the minimum distance between any two points in different clusters, the maximum distance between any two points in different clusters, or the average distance of all pairs of points in different clusters.