Refactor default judges #36

SecroLoL · 2025-01-11T06:58:53Z

Default judges used to be created with the syntax

from judgeval.scorers import JudgmentScorer
from judgeval.constants import APIScorer

scorer = JudgmentScorer(threshold=0.5, score_type=APIScorer.FAITHFULNESS
...

We can now create them by directly importing

from judgeval.scorers import FaithfulnessScorer

scorer = FaithfulnessScorer(threshold=0.5)

This is a stylistic change but also reveals strong future benefit as we add scorers that require specific args, such as the new JSONCorrectness scorer, which needs a schema field

scorer = JSONCorrectnessScorer(schema=...)

We wouldn't have been able to provide this with the previous schema, but now we can.

Major changes

Implement each default judgment scorer as its own class
Modify functions that run evaluations (standard evals and span-level evaluations) to accommodate this
Update all existing UTs to be compatible, + add UTs for all default scorers

…stead)

…new scorer structure

…eds to be manually set.

… or default.

JCamyre

Take a look at my comments, this is sick

Pipfile

e2etests/judgment_client_test.py

JCamyre · 2025-01-11T22:35:41Z

judgeval/common/tracer.py

            model=model,
            metadata={},
            log_results=log_results,
-            project_name="TestSpanLevel",
-            eval_run_name="TestSpanLevel",
+            project_name="TestSpanLevel1",  # TODO this should be dynamic


I will handle this, I'll share my thoughts in Slack. In my multi-step eval PR, I added a project and trace name, which will tie in nicely to generate automatic eval run names (to improve UX while remaining clear).

judgeval/scorers/base_scorer.py

judgeval/scorers/judgeval_scorers/__init__.py

SecroLoL added 20 commits January 10, 2025 15:07

Experiment with new default scorer interface by adding JSON correctness

efa9ad5

Wrap FaithfulnessScorer into individual class

2012865

Wrap AnswerRelevancyScorer into individual class

bae3e87

Add ContextualPrecision wrapper for its own class

efe53af

Add ContextualRecall wrapper for its own class

059bf71

Add ToolCorrectnessScorer wrapper for its own class

3633a3e

Add ContextualRelevancy wrapper for its own class

3ae5fb1

Add Summarization wrapper for its own class

c2cb7b2

Add HallucinationScorer wrapper for its own class

900b503

Remove test segment of code file (we can just use client test file in…

49c7d38

…stead)

Update __init__ files of the scorers/ and judgeval_scorers/ dirs for …

b007f43

…new scorer structure

Restrict threshold to between 0 <= x <= 1 on init

2451329

Add UT for AnswerRelevancyScorer

b35822e

Add UT for all new wrapped default scorers

6eff5cd

Edit JSONCorrectnessScorer init because it has an extra field that ne…

6d9a907

…eds to be manually set.

Update e2e tests with new wrapped default scorer syntax

5e636ef

Remove unused imports

ecf7530

Generalize span level async evaluation to run with any scorer, custom…

dcc79aa

… or default.

Update Pipfile

b1e0dc1

Update tracer test script with new default scorer

08fb199

JCamyre reviewed Jan 11, 2025

View reviewed changes

SecroLoL added 2 commits January 12, 2025 17:45

Remove dev packages from standard packages in Pipfile

ad1300d

Uncomment testing calls so all tests are run

03b2287

SecroLoL merged commit b6944b3 into main Jan 13, 2025

SecroLoL deleted the refactor_default_judges branch February 10, 2025 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor default judges #36

Refactor default judges #36

Uh oh!

SecroLoL commented Jan 11, 2025 •

edited

Loading

Uh oh!

JCamyre left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JCamyre Jan 11, 2025

Uh oh!

SecroLoL Jan 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Refactor default judges #36

Refactor default judges #36

Uh oh!

Conversation

SecroLoL commented Jan 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JCamyre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JCamyre Jan 11, 2025

Choose a reason for hiding this comment

Uh oh!

SecroLoL Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SecroLoL commented Jan 11, 2025 •

edited

Loading