Skip to content

Commit 0e4bac7

Browse files
committed
Merge branch 'keswani-Rohitkumar-davis-bouldin'
2 parents 56b4a97 + dfb8d06 commit 0e4bac7

File tree

1 file changed

+53
-9
lines changed

1 file changed

+53
-9
lines changed

book/4-clustering.tex

Lines changed: 53 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -357,11 +357,61 @@ \subsection{V Measure}
357357
individual Homogeneity and Completeness scores. Additionally, pair-based measures like Adjusted Rand Index (ARI) or
358358
information-theoretic measures like Variation of Information (VI) may provide complementary perspectives in specific use cases​.
359359

360-
% ---------- Davis Bouldin Score ----------
360+
% ---------- Davis Bouldin Index ----------
361361
\clearpage
362362
\thispagestyle{clusteringstyle}
363-
\section{Davis Bouldin Score}
364-
\subsection{Davis Bouldin Score}
363+
\section{Davies Bouldin Index}
364+
\subsection{Davies Bouldin Index}
365+
366+
The Davies-Bouldin Index measures the quality of clustering by evaluating the average similarity between each cluster \( C_i \) and its most similar neighboring cluster
367+
\( C_j \) for $i,j = 1, 2, ... k$. This similarity \( R_{ij} \) is calculated as the ratio of the within-cluster distance (how tightly packed the cluster is) to the between-cluster distance (how far apart the clusters are).
368+
369+
% The Davies Bouldin score is the average similairty measure of each cluster $C_{i}$ for $i = 1,2,...k$ with its most similar cluster $C_{j}$, where similarity $R_{ij}$ is the ratio of within-cluster distances to between cluster distance. So the clusters which are farther apart and less dispersed will result in a better score. The Davis Bouldin Index is defined as
370+
371+
\begin{center}
372+
\tikz{
373+
\node[inner sep=2pt, font=\Large] (a) {
374+
{
375+
$\displaystyle
376+
DB = \frac{1}{k} \sum_{{\color{nmlpurple}i}=1}^{k} max_{{\color{nmlpurple}i} \neq {\color{cyan}j}} R_{{\color{nmlpurple}i}{\color{cyan}j}}
377+
$
378+
}
379+
};
380+
}
381+
% \end{center}
382+
383+
% \begin{center}
384+
\tikz{
385+
\node[inner sep=2pt, font=\Large] (a) {
386+
{
387+
$\displaystyle
388+
R_{{\color{nmlpurple}i}{\color{cyan}j}} = \frac{s_{\color{nmlpurple}i} + s_{\color{cyan}j}}{d_{\color{nmlpurple}i\color{cyan}j}}
389+
$
390+
}
391+
};
392+
\draw[-latex,nmlpurple, semithick] ($(a.north)+(1.2,0.05)$) to[bend left=15] node[pos=1, right] {cluster diameter} +(1,.5);
393+
\draw[-latex,cyan, semithick] ($(a.south)+(0.6,-0.05)$) to[bend left=15] node[pos=1, left] {distance between centroids} +(-1,-.5);
394+
}
395+
\end{center}
396+
397+
% A lower Davies-Bouldin score indicates better clustering, as it suggests that clusters are more compact and well-separated from one another.
398+
% where $s_{i}$ is the average distance between each point of cluster $i$ and centroid of that cluster and $d_{ij}$ is the distance between cluster centroids $i$ and $j$. The minium Davies Bouldin score is zero, with lower values indicating better clustering.
399+
400+
Here, \( s_i \) represents the average distance between each point in cluster \( i \) and the centroid of that cluster,
401+
while \( d_{ij} \) is the distance between the centroids of clusters \( i \) and \( j \). The Davies-Bouldin score has a
402+
minimum value of zero, with lower scores indicating better-defined clusters that are compact and well-separated.
403+
404+
\textbf{When to use the Davies-Bouldin Index?}
405+
406+
Davies-Bouldin Index is particularly useful when we have ground truth labels and clusters are roughly spherical and centroid-based.
407+
408+
\coloredboxes{
409+
\item The computation of Davies-Bouldin Index is simpler than that of Silhouette scores, zero is the lowest possible score and closer the value to zero indicate a better separation.
410+
}
411+
{
412+
\item Davies-Bouldin Index can generally be higher for convex clusters than other clusters, such as density based clusters.
413+
\item The usage of centroid distance limits the distance metric to Euclidean space.
414+
}
365415

366416
% ---------- Fowlkes-Mallows Index ----------
367417
\clearpage
@@ -413,12 +463,6 @@ \subsection{Fowlkes-Mallows Index}
413463
Information (NMI) can offer additional perspectives.
414464

415465

416-
417-
418-
419-
420-
421-
422466
% ---------- Silhouette Score ----------
423467
\clearpage
424468
\thispagestyle{clusteringstyle}

0 commit comments

Comments
 (0)