## Analyzing a Dendrogram

dendrogram is a tree data structure that allows us to represent the entire clustering hierarchy produced by either an agglomerative or divisive algorithm. The idea is to put the samples on the x axis and the dissimilarity level on the y axis. Whenever two clusters are merged, the dendrogram shows a connection corresponding to the dissimilarity level at which it occurred. Hence, in an agglomerative scenario, a dendrogram always starts with all samples considered as clusters and moves upward (the direction is purely conventional) until a single cluster is defined.

For didactic purposes, it’s preferable to show the dendrogram corresponding to a very small dataset, X, but all the concepts can be applied to any situation. However, with larger datasets, it will often be necessary to apply some truncations in order to visualize the entire structure in a more compact form.

## How to do it?

Let’s consider a small dataset, X, made up of 12 bidimensional samples generated by 4 Gaussian distributions with mean vectors in the range (-1, 1) × (-1, 1):

The dataset (with the labels) is shown in the following screenshot:

In order to generate a dendrogram (using SciPy), we first need to create a linkage matrix. In this case, we have chosen a Euclidean metric with Ward’s linkage (I encourage you to perform the analysis with different configurations):

The dm array is a condensed pairwise distance matrix, while Z is the linkage matrix produced by Ward’s method (the linkage() function requires the method parameter, which accepts, among others, the values single, complete, average, and ward). At this point, we can generate and plot the dendrogram (the dendrogram () function can automatically plot the diagram using a default or supplied Matplotlib axis object):

The diagram is displayed in the following screenshot:

As explained in the preceding screenshot, the x axis represents the samples intended to minimize the risk of cross-connections, while the y axis shows the dissimilarity level. Let’s now analyze the diagram from the bottom. The initial state corresponds to all samples considered as independent clusters (so the dissimilarity is null). Moving upward, we start to observe the first mergers. In particular, when the dissimilarity is about 0.35, samples 1 and 3 are merged.

The second step happens with a dissimilarity of slightly below 0.5, when the samples 0 and 9 are also merged. The process goes on until a dissimilarity of about 5.25, when a single cluster is created. Let’s now dissect the dendrogram horizontally when the dissimilarity is equal to 1.25. Looking at the underlying connections, we discover that the clustering structure is: {6}, {758}, {09410}, {11}, {213}.

## Clustering

Therefore, we have five clusters, with two of them made up of a single sample. It’s not surprising to observe that samples 6 and 11 are the last ones to be merged. In fact, they are much further apart than all the other ones. In the following screenshot, four different levels are shown (only the clusters containing more than one sample are marked with a circle):

As is easy to understand, the agglomeration starts by selecting the most similar clusters/samples and proceeds by adding the nearest neighbors, until the root of the tree has been reached. In our case, at a dissimilarity level equal to 2.0, three well-defined clusters have been detected. The left one is also kept in the next cut, while the two right ones (which are clearly closer) are selected for merging in order to yield a single cluster. The process itself is straightforward and doesn’t need particular explanations; however, there are two important considerations.

The first one is inherent to the dendrogram structure itself. Contrary to other methods, hierarchical clustering allows observing an entire clustering tree, and such a feature is extremely helpful when it’s necessary to show how the process evolves by increasing the dissimilarity level. For example, a product recommender application could not provide any information about the desired number of clusters representing the users, but executive management might be interested in understanding how the merging process is structured and evolves.

In fact, observing how clusters are merged allows deep insight into the underlying geometry and it’s also possible to discover which clusters could potentially be considered as parts of larger ones. In our example, at level 0.5, we have the small cluster {13}. The question of “what samples can be added to this cluster by increasing the dissimilarity?” can be answered immediately with {2}. Of course, in this case, this is a trivial problem that can be solved by looking at the data plot, but with high-dimensional datasets, it can become more difficult without the support of a dendrogram.

The second advantage of dendrograms is the possibility to compare the behavior of different linkage methods. Using Ward’s method, the first mergers happen at quite low dissimilarity levels, but there’s a large gap between five clusters and three clusters. This is a consequence of both the geometry and the merging strategy. What happens, for example, if we employ a single linkage (which is intrinsically very different)? The corresponding dendrogram is shown in the following screenshot:

## Conclusion

The conclusion is that the dendrogram is asymmetric and the clusters are generally merged with a single sample or with small agglomerates. Starting from the right, we can see that samples {11} and {6} were merged very late. Moreover, sample {6} (which could be an outlier) is merged at the highest dissimilarity, when the final single cluster must be produced. The process can be better understood with the following screenshot: