Rand Index

santiviquez · santiviquez · commit 9edd30fc538b · 2025-01-17T13:54:10.000-06:00
diff --git a/book/4-clustering.tex b/book/4-clustering.tex
@@ -38,17 +38,44 @@ \subsection{Mutual Info Score}
 
 }
 
-% ---------- rand score ----------
+% ---------- rand index ----------
 \clearpage
 \thispagestyle{clusteringstyle}
-\section{Rand Score}
-\subsection{Rand Score}
+\section{Rand index}
+\subsection{Rand index}
 
-% ---------- adjusted rand score ----------
-\clearpage
-\thispagestyle{clusteringstyle}
-\section{Adjusted Rand Score}
-\subsection{Adjusted Rand Score}
+The Rand Index (RI) is a clustering metric that measures the similarity between two clusterings using the predicted labels generated by an algorithm
+and the true labels or labels comming from a reference clustering.
+
+\begin{center}
+    % Formula of the type:
+    % Number of agreeing pairs / Number of pairs
+    FORMULA GOES HERE
+\end{center}
+
+The standard RI ranges from 0 to 1, where 1 indicates perfect agreement between the predicted and reference clusterings.
+However, for random labelings, the RI does not yield values close to 0, as it lacks an adjustment for chance. To address this,
+the Adjusted Rand Index (ARI) refines the RI by accounting for randomness. ARI values range from -0.5 to 1, where scores near 0
+signify clustering results comparable to random labelings. RI is also equivalent to the accuracy score in a pairwise binary
+classification task, evaluating the fraction of pairs correctly classified as "same cluster" (True Positives) or
+"different cluster" (True Negatives).
+
+
+\textbf{When to use Random Index Scores?}
+
+Use RI/ARI when ground truth labels are available for benchmarking clustering performance. A comparison of consensus across
+multiple clusterings is needed. Interpretability and connection to pairwise agreement are desired.
+
+\coloredboxes{
+    \item Symmetric. Switching true label with predicted ones will return the same score.
+    \item Lower and upper bounded ranges for both RI and ARI.
+    \item Can be used as a consensus score.
+}
+{
+    \item Requires knowledge of ground truth classes.
+    \item Similar to accuracy in binary classification, unadjusted RI is affected by class imbalance, which can result in high 
+    RI scores even when the clusterings are significantly different.
+}
 
 % ---------- calinski harabasz score ----------
 \clearpage