One key step in K-means clustering is deciding on the number of clusters. Ideally, the data itself should guide us on the optimal number of groups.
There are several ways to detemine the number of cluster
This is what we want to achieve in a picture with our groups:
Elbow criterion
The elbow criterion helps us achieve this. It plots the ratio of within-cluster variance to between-cluster variance against the number of clusters. We want this ratio to be low, indicating tight clusters and high separation between them.
In a graph:
It looks like this:
As we increase the number of clusters, this ratio initially drops significantly. However, there comes a point where adding more clusters doesn't lead to a substantial improvement.
This point, known as the "elbow," is considered the ideal number of clusters.