Skip to content

Commit 49adf64

Browse files
committed
Cohen's Kappa
1 parent 13c57fa commit 49adf64

File tree

1 file changed

+24
-2
lines changed

1 file changed

+24
-2
lines changed

book/3-classification.tex

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -733,7 +733,7 @@ \subsection{D-squared Log Loss Score}
733733
It measures the relative improvement of a model's log loss compared to a naive baseline model that always predicts the mean probability.
734734

735735
%
736-
% FORMULA GOES HERE
736+
FORMULA GOES HERE
737737
%
738738

739739
As with R-squared, D-squared ranges from -infinity to 1, where 1 indicates perfect predictions, 0 suggests the model performs no better than the baseline, and negative values indicate
@@ -765,7 +765,7 @@ \subsection{P4-metric}
765765
more balanced measure of classifier performance.
766766

767767
%
768-
% FORMULA GOES HERE
768+
FORMULA GOES HERE
769769
%
770770

771771
The metric ranges from 0 to 1, where 1 indicates perfect classification (all probabilities equal 1) and 0 indicates complete failure (any probability equals 0).
@@ -792,6 +792,28 @@ \subsection{P4-metric}
792792
\section{Cohen's Kappa}
793793
\subsection{Cohen's Kappa}
794794

795+
Cohen's Kappa is a metric that measures inter-rater reliability and agreement between classifiers while accounting for the agreement that could occur by chance.
796+
This metric is designed to compare labelings by two different human annotators, not like othe metrics that compares a classifier versus a ground truth.
797+
798+
%
799+
FORMULA GOES HERE
800+
%
801+
802+
The score ranges from -1 to 1, where 1 indicates perfect agreement, 0 suggests agreement no better than chance, and negative values indicate worse than chance agreement.
803+
804+
\textbf{When to use Cohen's Kappa?}
805+
806+
When you want to compute the agreement between two human annotators on a classification task.
807+
808+
\coloredboxes{
809+
\item Accounts for agreement by chance.
810+
\item Widely accepted in many fields.
811+
}
812+
{
813+
\item It is not adviced to use it as a classification metric. Only to measure inter-annotator agreement.
814+
\item Some researches argue that the metric is unreliable when dealing with rare events (unbalanced data).
815+
}
816+
795817
% ---------- Phi Coefficient ----------
796818
\clearpage
797819
\thispagestyle{classificationstyle}

0 commit comments

Comments
 (0)