zettelkasten

Search IconIcon to open search
Dark ModeDark Mode

Agglomerative Clustering

A bottom-up approach to do Hierarchical Clustering.

Given many data points in a something-dimensional space, produce a hierarchy.

In agglomerative clustering, we:

  1. start by putting each point into an orphan cluster
  2. greedily find the most similar clusters and merge them

The similarity metric in this algorithm is not specified. It can just be something like:

  • euclidean distance
  • manhattan distance

Also, the distance between clusters is also unspecified, because we never defined cluster center. There can be many ways to define this:

  • maximum/complete linkage — distance between furthest points in the clusters
  • minimum/single linkage — distance between closest points in the clusters
  • average linkage — average of all pairwise distances in two clusters
  • centroid linkage — distance between cluster centres
  • ward — distance = how much does merging that pair of clusters increase the sum of squares of distance away from the centre

The result can often be represented as a dendrogram.