Skip to content

Commit 9edd30f

Browse files
committed
Rand Index
1 parent 7b7b406 commit 9edd30f

File tree

1 file changed

+35
-8
lines changed

1 file changed

+35
-8
lines changed

book/4-clustering.tex

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -38,17 +38,44 @@ \subsection{Mutual Info Score}
3838

3939
}
4040

41-
% ---------- rand score ----------
41+
% ---------- rand index ----------
4242
\clearpage
4343
\thispagestyle{clusteringstyle}
44-
\section{Rand Score}
45-
\subsection{Rand Score}
44+
\section{Rand index}
45+
\subsection{Rand index}
4646

47-
% ---------- adjusted rand score ----------
48-
\clearpage
49-
\thispagestyle{clusteringstyle}
50-
\section{Adjusted Rand Score}
51-
\subsection{Adjusted Rand Score}
47+
The Rand Index (RI) is a clustering metric that measures the similarity between two clusterings using the predicted labels generated by an algorithm
48+
and the true labels or labels comming from a reference clustering.
49+
50+
\begin{center}
51+
% Formula of the type:
52+
% Number of agreeing pairs / Number of pairs
53+
FORMULA GOES HERE
54+
\end{center}
55+
56+
The standard RI ranges from 0 to 1, where 1 indicates perfect agreement between the predicted and reference clusterings.
57+
However, for random labelings, the RI does not yield values close to 0, as it lacks an adjustment for chance. To address this,
58+
the Adjusted Rand Index (ARI) refines the RI by accounting for randomness. ARI values range from -0.5 to 1, where scores near 0
59+
signify clustering results comparable to random labelings. RI is also equivalent to the accuracy score in a pairwise binary
60+
classification task, evaluating the fraction of pairs correctly classified as "same cluster" (True Positives) or
61+
"different cluster" (True Negatives).
62+
63+
64+
\textbf{When to use Random Index Scores?}
65+
66+
Use RI/ARI when ground truth labels are available for benchmarking clustering performance. A comparison of consensus across
67+
multiple clusterings is needed. Interpretability and connection to pairwise agreement are desired.
68+
69+
\coloredboxes{
70+
\item Symmetric. Switching true label with predicted ones will return the same score.
71+
\item Lower and upper bounded ranges for both RI and ARI.
72+
\item Can be used as a consensus score.
73+
}
74+
{
75+
\item Requires knowledge of ground truth classes.
76+
\item Similar to accuracy in binary classification, unadjusted RI is affected by class imbalance, which can result in high
77+
RI scores even when the clusterings are significantly different.
78+
}
5279

5380
% ---------- calinski harabasz score ----------
5481
\clearpage

0 commit comments

Comments
 (0)