You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/3-classification.tex
+24-2Lines changed: 24 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -733,7 +733,7 @@ \subsection{D-squared Log Loss Score}
733
733
It measures the relative improvement of a model's log loss compared to a naive baseline model that always predicts the mean probability.
734
734
735
735
%
736
-
%FORMULA GOES HERE
736
+
FORMULA GOES HERE
737
737
%
738
738
739
739
As with R-squared, D-squared ranges from -infinity to 1, where 1 indicates perfect predictions, 0 suggests the model performs no better than the baseline, and negative values indicate
@@ -765,7 +765,7 @@ \subsection{P4-metric}
765
765
more balanced measure of classifier performance.
766
766
767
767
%
768
-
%FORMULA GOES HERE
768
+
FORMULA GOES HERE
769
769
%
770
770
771
771
The metric ranges from 0 to 1, where 1 indicates perfect classification (all probabilities equal 1) and 0 indicates complete failure (any probability equals 0).
@@ -792,6 +792,28 @@ \subsection{P4-metric}
792
792
\section{Cohen's Kappa}
793
793
\subsection{Cohen's Kappa}
794
794
795
+
Cohen's Kappa is a metric that measures inter-rater reliability and agreement between classifiers while accounting for the agreement that could occur by chance.
796
+
This metric is designed to compare labelings by two different human annotators, not like othe metrics that compares a classifier versus a ground truth.
797
+
798
+
%
799
+
FORMULA GOES HERE
800
+
%
801
+
802
+
The score ranges from -1 to 1, where 1 indicates perfect agreement, 0 suggests agreement no better than chance, and negative values indicate worse than chance agreement.
803
+
804
+
\textbf{When to use Cohen's Kappa?}
805
+
806
+
When you want to compute the agreement between two human annotators on a classification task.
807
+
808
+
\coloredboxes{
809
+
\item Accounts for agreement by chance.
810
+
\item Widely accepted in many fields.
811
+
}
812
+
{
813
+
\item It is not adviced to use it as a classification metric. Only to measure inter-annotator agreement.
814
+
\item Some researches argue that the metric is unreliable when dealing with rare events (unbalanced data).
0 commit comments