Sequence to Trace Conversion #254

alanzhang25 · 2025-05-18T00:35:22Z

📝 Summary

🎯 Purpose

🎥 Demo of Changes

🧪 Testing

✅ Checklist

Self-review
Video demo of changes
Unit Tests and CI/CD tests are passing
Reviewers assigned

📌 Linear Issue

✏️ Additional Notes

gemini-code-assist

Hello @alanzhang25, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR focuses on a significant refactor within the judgeval library, specifically transitioning from the Sequence concept for evaluation runs to using Trace objects. This involves renaming core evaluation functions and data structures, updating how evaluation data is handled and sent to the backend API, and removing deprecated Sequence-specific logic from dataset operations and tests. Additionally, the tracer has been updated to support an offline mode.

Highlights

Sequence Deprecation: The Sequence data structure and related functionality for defining evaluation runs have been removed or replaced.
Trace-based Evaluation: Evaluation runs now utilize Trace objects (specifically TraceSpans) as the primary unit for structured evaluation, particularly when evaluating agent functions that produce traces.
API Endpoint Updates: The API endpoint for running sequence evaluations (/evaluate_sequence/) has been replaced with a new endpoint for trace evaluations (/evaluate_trace/). The endpoint for retrieving sequences from traces has also been removed.
Dataset Simplification: Dataset operations and tests related to storing and appending Sequence objects have been removed, simplifying the dataset model to primarily handle Examples.
Tracer Offline Mode: The Tracer now includes an offline_mode to control whether trace data is sent to the backend API during execution, which is utilized in the new trace evaluation flow.

Changelog

Click here to see the changelog

src/demo/sequence_test.py
- Removed old sequence evaluation logic (lines 1-16).
- Removed another old sequence evaluation logic block (lines 1-46).
- Modified search_tavily to return a hardcoded string instead of calling the Tavily API (lines 21-24).
- Commented out the OpenAI API call in create_travel_plan and returned a hardcoded string (lines 84-92).
- Added project_name="travel_agent_demo" to the judgment.assert_test call (line 145).
src/e2etests/test_all_scorers.py
- Removed import of Sequence (line 26).
- Removed the test_derailment_scorer function which used Sequence (lines 536-560).
src/e2etests/test_dataset_operations.py
- Removed import of Sequence (line 11).
- Removed the test_dataset_with_sequence function (lines 41-59).
- Removed the test_append_sequence_dataset function (lines 136-179).
src/judgeval/common/tracer.py
- Added offline_mode to the trace data payload in TraceSpan.save (line 1057).
- Added a check in TraceSpan.save to only call save_trace if not in offline mode (line 1062).
- Added offline_mode parameter to Tracer.__init__ (line 1317).
- Stored offline_mode as an instance variable in Tracer.__init__ (line 1358).
src/judgeval/constants.py
- Renamed JUDGMENT_SEQUENCE_EVAL_API_URL to JUDGMENT_TRACE_EVAL_API_URL (line 43).
- Removed JUDGMENT_DATASETS_APPEND_SEQUENCES_API_URL (line 46).
- Removed JUDGMENT_RETRIEVE_SEQUENCE_FROM_TRACE_API_URL (line 53).
src/judgeval/data/init.py
- Removed import of Sequence (line 5).
- Removed Sequence from the __all__ list (line 17).
src/judgeval/data/datasets/dataset.py
- Removed import of Sequence (line 10).
- Removed sequences field from EvalDataset dataclass (line 16).
- Removed sequences parameter and assignment from EvalDataset.__init__ (lines 24, 31).
- Removed add_sequence method (lines 227-228).
- Removed sequences from __str__ representation (line 316).
src/judgeval/data/datasets/eval_dataset_client.py
- Removed import of Sequence (line 17).
- Removed JUDGMENT_DATASETS_APPEND_SEQUENCES_API_URL constant import (line 10).
- Removed sequences field and is_sequence from the payload in push method (lines 62-63).
- Removed append_sequences method (lines 155-210).
- Removed assignment of sequences from the pulled payload in pull method (line 265).
src/judgeval/data/example.py
- Removed sequence_order field from Example model (line 40).
src/judgeval/data/result.py
- Replaced import of Sequence with TraceSpan (line 6).
- Updated the type hint for data_object to include TraceSpan instead of Sequence (line 27).
- Updated the type hint for data_object in generate_scoring_result to include TraceSpan instead of Sequence (line 52).
src/judgeval/data/sequence.py
- Deleted the entire file defining the Sequence class.
src/judgeval/data/trace.py
- Added expected_tools and additional_metadata fields to TraceSpan (lines 16-17).
src/judgeval/data/trace_run.py
- Replaced import of Sequence with Trace (line 4).
- Renamed SequenceRun class to TraceRun (line 10).
- Updated docstring to refer to traces instead of sequences (line 17).
- Renamed sequences field to traces (line 32).
src/judgeval/judgment_client.py
- Replaced import of Sequence with Trace (line 15).
- Replaced import of SequenceRun with TraceRun (line 28).
- Renamed run_sequence_eval to run_trace_eval in imports (line 26).
- Renamed run_sequence_evaluation method to run_trace_evaluation (line 108).
- Updated parameter name from sequences to traces in run_trace_evaluation (line 112).
- Updated default eval_run_name in run_trace_evaluation (line 117).
- Updated parameter name from sequences to traces in validation checks within run_trace_evaluation (lines 137, 140).
- Renamed sequence_run variable to trace_run in run_trace_evaluation (line 143).
- Updated parameter name from sequences to traces when creating TraceRun object (line 146).
- Updated call to run_trace_eval and variable name from sequence_run to trace_run (line 155).
- Updated error message to refer to TraceRun (line 157).
- Removed append_sequence_dataset method (lines 248-252).
- Changed call in assert_test from run_sequence_evaluation to run_trace_evaluation (line 520).
src/judgeval/run_evaluation.py
- Removed import of Sequence (line 16).
- Removed JUDGMENT_SEQUENCE_EVAL_API_URL and JUDGMENT_RETRIEVE_SEQUENCE_FROM_TRACE_API_URL constants (lines 28, 32).
- Replaced import of SequenceRun with TraceRun (line 40).
- Renamed execute_api_sequence_eval function to execute_api_trace_eval (line 99).
- Renamed parameter from sequence_run to trace_run in execute_api_trace_eval (line 99).
- Updated payload creation and API URL in execute_api_trace_eval (lines 106-114).
- Updated type hint for run parameter in log_evaluation_results to include TraceRun instead of SequenceRun (line 283).
- Removed retrieve_sequence_from_trace function (lines 330-372).
- Renamed run_sequence_eval function to run_trace_eval (line 371).
- Renamed parameter from sequence_run to trace_run in run_trace_eval (line 371).
- Updated checks for override, log_results, and append to use trace_run (lines 373, 381).
- Renamed new_sequences list to new_traces (line 392).
- Set tracer.offline_mode = True when function and tracer are provided (line 393).
- Modified logic to iterate over tracer.traces, create Trace objects, set expected_tools on the root span, and append to new_traces (lines 399-403).
- Assigned new_traces to trace_run.traces (line 404).
- Removed loop setting scorers on sequences (lines 453-454).
- Updated call to execute_api_trace_eval and variable name from sequence_run to trace_run (line 410).
- Updated error message to refer to TraceRun (line 417).
- Updated call to log_evaluation_results and variable name from sequence_run to trace_run (line 423).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

From sequence to trace,
A new evaluation space.
Code flows, spans bloom,
Dispelling the gloom,
For bugs, there's no hiding place.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request successfully transitions the evaluation framework from using Sequence objects to Trace objects, particularly for evaluating the execution of a function. The changes involve renaming classes and methods, updating data models, and removing deprecated Sequence-specific functionality across the codebase. The implementation appears consistent and well-executed.

Summary of Findings

Test Coverage for Trace Evaluation: Several tests related to Sequence evaluation and datasets containing Sequence objects have been removed. While this is correct given the PR's goal, there appear to be gaps in test coverage for the new Trace-based evaluation functionality, particularly for specific scorers like DerailmentScorer and for dataset operations involving traces.

Merge Readiness

The core logic changes for converting from Sequence to Trace evaluation seem correct and well-implemented. However, the absence of corresponding tests for the new Trace-based functionality, especially for dataset operations and specific scorers, means there's reduced confidence in the completeness and correctness of the new features. I recommend addressing the test coverage gaps for Trace evaluation and dataset operations before merging. I am unable to approve this pull request directly; please have other reviewers review and approve this code before merging.

JCamyre

Quick questions - looks good to me! Gonna run some tests with your PR's

src/judgeval/common/tracer.py

src/judgeval/data/result.py

src/judgeval/data/trace_run.py

JCamyre · 2025-05-19T04:02:44Z

src/judgeval/evaluation_run.py

@@ -79,7 +79,7 @@ def validate_eval_name(cls, v, values):
            raise ValueError("Eval name is required when log_results is True. Please include the eval_run_name argument.")
        return v

-    @field_validator('examples', mode='before')
+    @field_validator('examples')


Why can we remove the mode='before'?

This allows for building the pydantic object(with the **trace) so that it doesnt validate before converting it

Let's think if this is best practice

JCamyre · 2025-05-19T06:41:25Z

Can you Tool calling metric to the docs? Work off of Galen's branch since it's not merged in yet: doc_update-ethan
Thanks

JCamyre

LGTM

gemini-code-assist bot reviewed May 18, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 18, 2025

View reviewed changes

alanzhang25 force-pushed the az-sequence-to-trace branch from 6de92f3 to eb78eb3 Compare May 18, 2025 18:28

alanzhang25 marked this pull request as ready for review May 18, 2025 18:29

JCamyre self-requested a review May 19, 2025 02:57

JCamyre added the In Review Task is currently being reviewed label May 19, 2025

JCamyre requested changes May 19, 2025

View reviewed changes

alanzhang25 added 5 commits May 19, 2025 11:30

Sequence to Trace Conversion

53f92e6

trace save

f424c9f

comment out;

8712477

pydantic

848766a

updates

9f6480d

alanzhang25 force-pushed the az-sequence-to-trace branch from 3684de3 to 9f6480d Compare May 19, 2025 18:30

Merge branch 'main' into az-sequence-to-trace

08bba42

JCamyre approved these changes May 19, 2025

View reviewed changes

JCamyre merged commit 186a33b into main May 19, 2025
3 of 4 checks passed

JCamyre deleted the az-sequence-to-trace branch May 19, 2025 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sequence to Trace Conversion #254

Sequence to Trace Conversion #254

Uh oh!

alanzhang25 commented May 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

JCamyre left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JCamyre May 19, 2025

Uh oh!

alanzhang25 May 19, 2025

Uh oh!

JCamyre May 19, 2025

Uh oh!

JCamyre May 19, 2025

Uh oh!

JCamyre commented May 19, 2025

Uh oh!

JCamyre left a comment

Uh oh!

Uh oh!

Uh oh!

Sequence to Trace Conversion #254

Sequence to Trace Conversion #254

Uh oh!

Conversation

alanzhang25 commented May 18, 2025

📝 Summary

🎯 Purpose

🎥 Demo of Changes

🧪 Testing

✅ Checklist

📌 Linear Issue

✏️ Additional Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

JCamyre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JCamyre May 19, 2025

Choose a reason for hiding this comment

Uh oh!

alanzhang25 May 19, 2025

Choose a reason for hiding this comment

Uh oh!

JCamyre May 19, 2025

Choose a reason for hiding this comment

Uh oh!

JCamyre May 19, 2025

Choose a reason for hiding this comment

Uh oh!

JCamyre commented May 19, 2025

Uh oh!

JCamyre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!