Clustering

What is clustering? A cluster is a group of objects that are similar to other objects in the cluster, and dissimilar to data points in other clusters. Clustering is the act of finding clusters in a dataset through unsupervised methods.

Applications

  • Customer segmentation is the practice of partitioning a customer base into groups of individuals that have similar characteristics.
  • Banking - clusters of normal transactions to find fraudulent transaction

Algorithms

  • Partitioned-based Clustering
  • Hierarchical Clustering - produces trees of clusters
    • Agglomerative, Divisive
  • Density-based Clustering
    • DB Scan

Bottom Up Clustering

  • dendrogram Complete Linkage Average Linkage