Span-level evals additional features #34

JCamyre · 2025-01-08T22:29:07Z

Add a tag for each span, specifying which type of span it is: LLM call, tool call, regular span, etc.
Automatically assign project names and unique eval run names to each span evaluation ran
Fix depth count.

…tions, tools, etc. Tweak condense() logic to properly support structure.

…_type shows up in the final trace.

…ame to trace.

… Add type hinting for Tracer fields.

…races.

… the results after logged (important as it contains example_id).

…val API endpoint.

… example page.

…vel-evals

…-evals

…joseph/span-level-evals

JCamyre

Run UT's

JCamyre

Rerun UT's

JCamyre

Rerun UT's.

JCamyre

Rerun UT's.

SecroLoL · 2025-01-21T06:32:44Z

tests/common/test_tracer.py

-        {"type": "output", "function": "test_func", "depth": 1, "timestamp": 1.2, "output": "result"},
-        {"type": "exit", "function": "test_func", "depth": 0, "timestamp": 2.0},
+        {"type": "enter", "function": "test_func", "depth": base_depth, "timestamp": 1.0},
+        {"type": "input", "function": "test_func", "depth": base_depth + 1, "timestamp": 1.1, "inputs": {"x": 1}},


SecroLoL · 2025-01-21T06:33:06Z

tests/common/test_tracer.py

-    mock_response.usage = MagicMock(prompt_tokens=10, completion_tokens=20, total_tokens=30)
-    client.chat.completions.create = MagicMock(return_value=mock_response)
+    mock_completion = MagicMock()
+    mock_completion.choices = [MagicMock(message=MagicMock(content="test response"))]


SecroLoL

LGTM. Nice PR -- I like the span names change. Seems like updating the UTs must've been annoying, thanks for doing this

JCamyre added 30 commits January 7, 2025 19:32

Small changes.

f51cbf3

Add a span_type field to traces, to specify between LLM calls, evalua…

1bd3197

…tions, tools, etc. Tweak condense() logic to properly support structure.

Pass span_type's into @judgment.observe()'s.

cf2adbb

Fix span_types not being passed in for all observe() cases.

1d3f9ff

Fix depth count issues with spans.

dcff0f9

Add span_type to the TraceEntry 'to dictionary' function so that span…

ccc9171

…_type shows up in the final trace.

Remove debugging print statements.

fa9bec7

Update prompt_scorer notebook docs to proper python version.

c57a0c7

Add e2e test for editing, updating, and pushing a classifier scorer.

e3d272a

Private functions for e2etests/test_prompt_scoring.py.

7fe7337

Fix unit tests which was accessing old private method names.

0bb2f91

Privatize methods and use new method name.

5ac335d

Update unit test and unit test mock object to use private method names.

7d19cd1

More privatization.

c48d072

Add update functions for ClassifierScorer.

1c33164

Add Judgment Client method to push classifier scorers from SDK side.

941ecbb

Add sleep to make llm_call function more realistic. Pass in project n…

9a9f28e

…ame to trace.

Add project name field to traces.

ea895ef

Remove judgment client test changes.

78a5857

Add automatic eval run name generation. Don't allow empty Trace name.…

22f8f17

… Add type hinting for Tracer fields.

Change trace and project name. Specify overwrite kwaarg.

fe07188

Add and pass arguments for logic relating to saving and overwriting t…

d756a4f

…races.

Add error handling from save trace API call.

8d214d0

Remove logic related to actual_eval_run_name. Add logic for receiving…

5be3b0d

… the results after logged (important as it contains example_id).

Add comments for pull_eval. Properly handle receiving updated fetch e…

f526528

…val API endpoint.

Add new fields to ScoringResult, needed for linking between trace and…

cc66f54

… example page.

Merge branch 'joseph/improve-trace-pages' into joseph/span-level-evals

4aa109c

Merge branch 'joseph/simplify-classifier-scorers' into joseph/span-le…

c9043ff

…vel-evals

Merge branch 'joseph/eval-run-name-uniqueness' into joseph/span-level…

c3f3cca

…-evals

Add demo folder. Add Patronus tracing workflow for comparison in demos.

5e00fa0

Add Patronus library, needed for Patronus demo.

684b8ce

JCamyre force-pushed the joseph/span-level-evals branch from c44abe8 to 684b8ce Compare January 19, 2025 19:48

JCamyre added 4 commits January 19, 2025 11:49

Make tracer test evals make more contextual sense.

8829ab9

Remove print statement.

74b62a0

Merge branch 'main' into joseph/span-level-evals

951858a

Merge branch 'main' of https://github.com/JudgmentLabs/judgeval into …

1d20648

…joseph/span-level-evals

JCamyre commented Jan 19, 2025

View reviewed changes

Fix failing UT's.

b01c104

JCamyre commented Jan 20, 2025

View reviewed changes

Fix test_condense_trace UT.

22ebf64

JCamyre commented Jan 20, 2025

View reviewed changes

SecroLoL reviewed Jan 21, 2025

View reviewed changes

SecroLoL merged commit 4587969 into main Jan 21, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Span-level evals additional features #34

Span-level evals additional features #34

Uh oh!

JCamyre commented Jan 8, 2025 •

edited

Loading

Uh oh!

JCamyre left a comment

Uh oh!

JCamyre left a comment

Uh oh!

JCamyre left a comment

Uh oh!

JCamyre left a comment

Uh oh!

SecroLoL Jan 21, 2025

Uh oh!

SecroLoL Jan 21, 2025

Uh oh!

SecroLoL left a comment

Uh oh!

Uh oh!

Uh oh!

Span-level evals additional features #34

Span-level evals additional features #34

Uh oh!

Conversation

JCamyre commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JCamyre left a comment

Choose a reason for hiding this comment

Uh oh!

JCamyre left a comment

Choose a reason for hiding this comment

Uh oh!

JCamyre left a comment

Choose a reason for hiding this comment

Uh oh!

JCamyre left a comment

Choose a reason for hiding this comment

Uh oh!

SecroLoL Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

SecroLoL Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

SecroLoL left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JCamyre commented Jan 8, 2025 •

edited

Loading