@@ -38,17 +38,44 @@ \subsection{Mutual Info Score}
38
38
39
39
}
40
40
41
- % ---------- rand score ----------
41
+ % ---------- rand index ----------
42
42
\clearpage
43
43
\thispagestyle {clusteringstyle}
44
- \section {Rand Score }
45
- \subsection {Rand Score }
44
+ \section {Rand index }
45
+ \subsection {Rand index }
46
46
47
- % ---------- adjusted rand score ----------
48
- \clearpage
49
- \thispagestyle {clusteringstyle}
50
- \section {Adjusted Rand Score }
51
- \subsection {Adjusted Rand Score }
47
+ The Rand Index (RI) is a clustering metric that measures the similarity between two clusterings using the predicted labels generated by an algorithm
48
+ and the true labels or labels comming from a reference clustering.
49
+
50
+ \begin {center }
51
+ % Formula of the type:
52
+ % Number of agreeing pairs / Number of pairs
53
+ FORMULA GOES HERE
54
+ \end {center }
55
+
56
+ The standard RI ranges from 0 to 1, where 1 indicates perfect agreement between the predicted and reference clusterings.
57
+ However, for random labelings, the RI does not yield values close to 0, as it lacks an adjustment for chance. To address this,
58
+ the Adjusted Rand Index (ARI) refines the RI by accounting for randomness. ARI values range from -0.5 to 1, where scores near 0
59
+ signify clustering results comparable to random labelings. RI is also equivalent to the accuracy score in a pairwise binary
60
+ classification task, evaluating the fraction of pairs correctly classified as "same cluster" (True Positives) or
61
+ "different cluster" (True Negatives).
62
+
63
+
64
+ \textbf {When to use Random Index Scores? }
65
+
66
+ Use RI/ARI when ground truth labels are available for benchmarking clustering performance. A comparison of consensus across
67
+ multiple clusterings is needed. Interpretability and connection to pairwise agreement are desired.
68
+
69
+ \coloredboxes {
70
+ \item Symmetric. Switching true label with predicted ones will return the same score.
71
+ \item Lower and upper bounded ranges for both RI and ARI.
72
+ \item Can be used as a consensus score.
73
+ }
74
+ {
75
+ \item Requires knowledge of ground truth classes.
76
+ \item Similar to accuracy in binary classification, unadjusted RI is affected by class imbalance, which can result in high
77
+ RI scores even when the clusterings are significantly different.
78
+ }
52
79
53
80
% ---------- calinski harabasz score ----------
54
81
\clearpage
0 commit comments