You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/4-clustering.tex
+53-9Lines changed: 53 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -357,11 +357,61 @@ \subsection{V Measure}
357
357
individual Homogeneity and Completeness scores. Additionally, pair-based measures like Adjusted Rand Index (ARI) or
358
358
information-theoretic measures like Variation of Information (VI) may provide complementary perspectives in specific use cases.
359
359
360
-
% ---------- Davis Bouldin Score ----------
360
+
% ---------- Davis Bouldin Index ----------
361
361
\clearpage
362
362
\thispagestyle{clusteringstyle}
363
-
\section{Davis Bouldin Score}
364
-
\subsection{Davis Bouldin Score}
363
+
\section{Davies Bouldin Index}
364
+
\subsection{Davies Bouldin Index}
365
+
366
+
The Davies-Bouldin Index measures the quality of clustering by evaluating the average similarity between each cluster \( C_i \) and its most similar neighboring cluster
367
+
\( C_j \) for $i,j = 1, 2, ... k$. This similarity \( R_{ij} \) is calculated as the ratio of the within-cluster distance (how tightly packed the cluster is) to the between-cluster distance (how far apart the clusters are).
368
+
369
+
% The Davies Bouldin score is the average similairty measure of each cluster $C_{i}$ for $i = 1,2,...k$ with its most similar cluster $C_{j}$, where similarity $R_{ij}$ is the ratio of within-cluster distances to between cluster distance. So the clusters which are farther apart and less dispersed will result in a better score. The Davis Bouldin Index is defined as
370
+
371
+
\begin{center}
372
+
\tikz{
373
+
\node[inner sep=2pt, font=\Large] (a) {
374
+
{
375
+
$\displaystyle
376
+
DB = \frac{1}{k} \sum_{{\color{nmlpurple}i}=1}^{k} max_{{\color{nmlpurple}i} \neq {\color{cyan}j}} R_{{\color{nmlpurple}i}{\color{cyan}j}}
% A lower Davies-Bouldin score indicates better clustering, as it suggests that clusters are more compact and well-separated from one another.
398
+
% where $s_{i}$ is the average distance between each point of cluster $i$ and centroid of that cluster and $d_{ij}$ is the distance between cluster centroids $i$ and $j$. The minium Davies Bouldin score is zero, with lower values indicating better clustering.
399
+
400
+
Here, \( s_i \) represents the average distance between each point in cluster \( i \) and the centroid of that cluster,
401
+
while \( d_{ij} \) is the distance between the centroids of clusters \( i \) and \( j \). The Davies-Bouldin score has a
402
+
minimum value of zero, with lower scores indicating better-defined clusters that are compact and well-separated.
403
+
404
+
\textbf{When to use the Davies-Bouldin Index?}
405
+
406
+
Davies-Bouldin Index is particularly useful when we have ground truth labels and clusters are roughly spherical and centroid-based.
407
+
408
+
\coloredboxes{
409
+
\item The computation of Davies-Bouldin Index is simpler than that of Silhouette scores, zero is the lowest possible score and closer the value to zero indicate a better separation.
410
+
}
411
+
{
412
+
\item Davies-Bouldin Index can generally be higher for convex clusters than other clusters, such as density based clusters.
413
+
\item The usage of centroid distance limits the distance metric to Euclidean space.
0 commit comments