Agglomerative Clustering
A bottom-up approach to do Hierarchical Clustering.
Given many data points in a something-dimensional space, produce a hierarchy.
In agglomerative clustering, we:
- start by putting each point into an orphan cluster
- greedily find the most similar clusters and merge them
The similarity metric in this algorithm is not specified. It can just be something like:
- euclidean distance
- manhattan distance
- …
Also, the distance between clusters is also unspecified, because we never defined cluster center. There can be many ways to define this:
- maximum/complete linkage — distance between furthest points in the clusters
- minimum/single linkage — distance between closest points in the clusters
- average linkage — average of all pairwise distances in two clusters
- centroid linkage — distance between cluster centres
- ward — distance = how much does merging that pair of clusters increase the sum of squares of distance away from the centre
The result can often be represented as a dendrogram.