Clustering: Ways to Organize Data
Last week, Sean talked about how to use the traits you know about for your customers in order to build personas by manually labeling clusters of customers. An alternative approach is to let a computer create the clusters of personas. This is called unsupervised classification because you are letting the computer decide how to use the values and characteristics of your data.
Another name for unsupervised classification is “clustering”. The are lots of different clustering techniques, differentiated by the approach they take to solve the problem. One of the most common distinctions is whether the clusters determined by the algorithm can be nested or not. Nested clustering algorithms are called “hierarchical”, while unnested ones are called “partitioned”.
This week, I’ll will dig into an example of each type of algorithm and see it in practice:
- K-means clustering, in theory
- K-means clustering, in practice
- Agglomerative hierarchical clustering, in theory
- Agglomerative hierarchical clustering, in practice
Let me know if you have any thoughts or questions along the way!