Understanding Hierarchical Clustering

What is Clustering?

Dendrograms

The dendrogram is a tree-like structure that is mainly used to store each step as a memory that the Hierarchical Clustering algorithm performs. In the dendrogram plot, the Y-axis shows the Euclidean distances between the data points, and the x-axis shows all the data points of the given dataset.

Hierarchical Clustering

Hierarchical Clustering is of two types.

  1. Agglomerative Hierarchical Clustering: It is popularly known as a bottom-up approach, wherein each data or observation is treated as its cluster. A pair of clusters are combined until all clusters are merged into one big cluster that contains all the data.

Now for agglomeration methods a question arises that, how do we measure the dissimilarity between two cluster observations? To answer this question we use different algorithms like:

  • Maximum or complete linkage clustering: Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the largest value of these dissimilarities as the distance between the two clusters. It tends to produce more compact clusters.
  • Minimum or single linkage clustering: Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the smallest of these dissimilarities as a linkage criterion. It tends to produce long, “loose” clusters.
  • Mean or average linkage clustering: Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the average of these dissimilarities as the distance between the two clusters. Can vary in the compactness of the clusters it creates.
  • Centroid linkage clustering: Computes the dissimilarity between the centroid for cluster 1 (a mean vector of length p, one element for each variable) and the centroid for cluster 2.
  • Ward’s minimum variance method: Minimizes the total within-cluster variance. At each step, the pair of clusters with the smallest between-cluster distance are merged. Tends to produce more compact clusters.

Determining Optimal Clusters

In the above case, there is no definitively clear optimal number of clusters. Although, we have borrowed down the list to just three (8, 9, 10) which can be compared easily.

Originally published at http://docs.google.com.

Data Science Enthusiast