Trace Datasets #291

alanzhang25 · 2025-05-30T00:31:45Z

📝 Summary

🎯 Purpose

🎥 Demo of Changes

🧪 Testing

✅ Checklist

Self-review
Video demo of changes
Unit Tests and CI/CD tests are passing
Reviewers assigned

📌 Linear Issue

✏️ Additional Notes

gemini-code-assist

Hello @alanzhang25, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here with a summary of this pull request titled "Trace Datasets".

The primary intent of this PR appears to be integrating trace data directly into the EvalDataset structure and ensuring this data can be persisted and retrieved via the dataset client. A significant part of the changes involves renaming the entries field within the Trace model and related code to trace_spans for better clarity, reflecting that these entries represent spans within a trace.

This change allows datasets to not only contain evaluation examples but also the associated trace information, which is crucial for understanding the execution flow and context during evaluations.

Highlights

Data Model Update: The Trace data model has been updated by renaming the entries field to trace_spans to more accurately describe its content.
Dataset Enhancement: The EvalDataset class now includes a list of Trace objects, allowing datasets to store trace data alongside evaluation examples.
Client Integration: The EvalDatasetClient has been modified to handle the pushing and pulling of trace data as part of the dataset payload to and from the backend service.
Code Consistency: References to the renamed field (entries -> trace_spans) have been updated across various parts of the codebase, including end-to-end tests, unit tests, and evaluation run logic.

Changelog

Click here to see the changelog

src/e2etests/test_judgee_traces_update.py
- Updated test data dictionaries to use trace_spans instead of entries when simulating trace save requests.
src/e2etests/test_tracer.py
- Removed a helper function print_trace_hierarchy.
src/judgeval/common/tracer.py
- Updated the save method to use trace_spans when preparing the trace data for saving.
src/judgeval/data/datasets/dataset.py
- Imported the Trace model.
- Added a traces: List[Trace] field to the EvalDataset dataclass.
- Added traces parameter to the __init__ method.
- Added an add_trace method to append Trace objects.
- Included traces in the __str__ representation.
src/judgeval/data/datasets/eval_dataset_client.py
- Imported the Trace model.
- Modified the push method to include the list of traces in the API request payload.
- Modified the pull method to deserialize trace data from the API response and assign it to the dataset.
- Added a print(payload) statement in the pull method (likely for debugging).
src/judgeval/data/trace.py
- Renamed the entries field to trace_spans in the Trace Pydantic model.
src/judgeval/run_evaluation.py
- Updated the reference from trace.entries[0] to trace.trace_spans[0] when setting expected tools during trace evaluation runs.
src/tests/common/test_tracer.py
- Updated a docstring to reflect the change from entries to trace_spans.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A list of spans,
Not entries, you see.
A dataset grows,
With trace history.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request primarily focuses on a consistent renaming of the entries field to trace_spans across the codebase, which improves clarity. Additionally, it introduces the capability for EvalDataset to manage a list of Trace objects, which is a useful enhancement.

The changes are generally well-implemented. I've identified a couple of areas for improvement, mainly concerning a debug print statement and a list manipulation pattern that could be more efficient.

Please also consider filling out the pull request description template (Summary, Purpose, Testing, etc.) to provide more context for reviewers and future reference.

Summary of Findings

List concatenation in add_trace: In src/judgeval/data/datasets/dataset.py, the add_trace method uses self.traces = self.traces + [t]. Using self.traces.append(t) would be more idiomatic and efficient, especially when adding many traces.
Debug print statement: A print(payload) statement was found in src/judgeval/data/datasets/eval_dataset_client.py within the pull method. This should be removed.
Debug log enhancement (low severity - not commented): In src/judgeval/data/datasets/dataset.py, the __init__ method's debug log debug(f"Initializing EvalDataset with {len(examples)} examples") could be updated to also mention the number of traces being initialized, e.g., debug(f"Initializing EvalDataset with {len(examples)} examples and {len(traces)} traces"). This was not added as a direct comment due to review settings.

Merge Readiness

The pull request introduces valuable changes by standardizing terminology and enhancing EvalDataset capabilities. However, there are a couple of medium-severity issues identified (an inefficient list append and a leftover debug print statement) that should be addressed before merging. Once these are resolved, the PR should be in good shape. As an AI, I am not authorized to approve pull requests; please ensure other reviewers approve these changes before merging.

gemini-code-assist · 2025-05-30T00:32:45Z

src/judgeval/data/datasets/dataset.py

@@ -220,6 +223,9 @@ def add_from_yaml(self, file_path: str) -> None:
    def add_example(self, e: Example) -> None:
        self.examples = self.examples + [e]
        # TODO if we need to add rank, then we need to do it here
+
+    def add_trace(self, t: Trace) -> None:
+        self.traces = self.traces + [t]


Is there a reason for using list concatenation (self.traces + [t]) here instead of append()? Using append() is generally more efficient for adding elements to a list as it modifies the list in-place rather than creating a new list each time. For a large number of traces, this could have a minor performance impact.

Suggested change

self.traces = self.traces + [t]

self.traces.append(t)

gemini-code-assist · 2025-05-30T00:32:45Z

src/judgeval/data/datasets/eval_dataset_client.py

@@ -201,7 +202,9 @@ def pull(self, alias: str, project_name: str) -> EvalDataset:

                info(f"Successfully pulled dataset with alias '{alias}'")
                payload = response.json()
+                print(payload)


It looks like this print(payload) statement was added for debugging. Could we remove this before merging to keep the console output clean?

adivate2021

Looks good for the most part, just minor things from Gemini

adivate2021 · 2025-05-30T22:21:55Z

src/judgeval/data/datasets/dataset.py

@@ -220,6 +223,9 @@ def add_from_yaml(self, file_path: str) -> None:
    def add_example(self, e: Example) -> None:
        self.examples = self.examples + [e]
        # TODO if we need to add rank, then we need to do it here
+
+    def add_trace(self, t: Trace) -> None:
+        self.traces = self.traces + [t]


adivate2021 · 2025-05-30T22:22:02Z

src/judgeval/data/datasets/eval_dataset_client.py

@@ -201,7 +202,9 @@ def pull(self, alias: str, project_name: str) -> EvalDataset:

                info(f"Successfully pulled dataset with alias '{alias}'")
                payload = response.json()
+                print(payload)


adivate2021

LGTM

* Staging and auto package release (#275) * add auto package release * add staging stuff * add test trigger * remove test trigger * timeout + minor error handling * gemini sugggestions * Trace Usage Edits (#266) * update litellm pyproject and fix e2etest (#276) * update litellm pyproject * add e2etest fix * Agent Names added (#270) * Agent Names added * Fixed agent names for deep tracing * Minor fixes * Dummy commit to trigger staging checks * Fix assert test e2e test (#277) * fix assert test e2e test * Update src/judgeval/judgment_client.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * add evaluation link (#279) * add evaluation link * fix run eval * add e2etest for yaml * Image file updates and social media badges (#274) * updated logos and socials badges * Update README.md * Delete package-lock.json --------- Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> * Testing updates (#269) * updates and s3 e2e tests * clear singleton before tests * coverage, fix http request blocker * test trigger * if always block * change working directory * fix * remove test trigger * gemini suggestions * add new trace column (#281) * change timeout to use variable (#282) * add code coverage for staging e2es (#285) * Add ToolDependency Scorer and Parameter Checking (#253) * Sequence to Trace Conversion * Add agent names with a new decorator * Sequence to Trace Conversion * trace save * comment out; * Changes to yaml and agent file * pydantic * Added tool dependency metric * Changed yaml * Add support for multiple assert test calls * Added parameter checking to ToolDependency metric * Agent Names added * Fixed tool dependency scorer with pydantic change * Added internal scorer for parameter checking * Support for action dependencies added * Changed multi agent test to better show case where param checking helps * Added error check for providing no tools when using param checking * Remove unused parameter --------- Co-authored-by: Alan <alanzhang2021@gmail.com> Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> Co-authored-by: JCamyre <jwcamry03@gmail.com> * modified deep tracing to chain with existing tracefuncs (#286) * Refactor agent names (#287) * Refactor agent names * Minor fix * Minor Gemini fix * Minor fixes to e2etest and TraceSpan datatype * Handle un-serializable types in Traces. Filter 'self' arguments. (#278) * Example Created At (#284) * fix logging for assert test trace (#290) * fix logging for assert test trace * fix agent * E2E additions based on assigned judgeval files (#283) * classifierscorer refactor, other e2es + uts additions * removed e2e * small updates * fix uts * remove print * Make default project name uuid (#293) * Raise warning for missing Org ID or API Key (#294) * Raise warning for missing Org ID or API Key * fix tests * fix tests * Agent state (#295) * agent state into tracer * model dump state before and after * revert return * add field mappings * fix feedback * Update Docs link (#297) * Ahh/http error recording (#289) * feat: tracer http errors as separate part of error output * Update src/judgeval/common/tracer.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * chore: cleanup commented code --------- Co-authored-by: Joseph S Camyre <68767176+JCamyre@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update README.md (#298) * Update Docs link * Add updated Discord link and add links to subsections. * add workflow (#300) * Example Check and User Prompt (#301) * Example Check and User Prompt * Test Fixes * Required Params for Custom Scorers * Trace Datasets (#291) * Trace Datasets * comment fix * python class fix --------- Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com> * Update blocked-pr.yaml (#302) * removed src/demo (#299) Co-authored-by: JCamyre <jwcamry03@gmail.com> --------- Co-authored-by: Alan Zhang <97066812+alanzhang25@users.noreply.github.com> Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: shunuen0 <138656153+shunuen0@users.noreply.github.com> Co-authored-by: Alan <alanzhang2021@gmail.com> Co-authored-by: JCamyre <jwcamry03@gmail.com> Co-authored-by: Joseph S Camyre <68767176+JCamyre@users.noreply.github.com> Co-authored-by: rishi763 <55810666+rishi763@users.noreply.github.com> Co-authored-by: Andrew Li <51971573+andrewli2403@users.noreply.github.com> Co-authored-by: Abhishek Govindarasu <abhishekgovindarasu@gmail.com>

gemini-code-assist bot reviewed May 30, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 30, 2025

View reviewed changes

adivate2021 self-requested a review May 30, 2025 17:58

adivate2021 requested changes May 30, 2025

View reviewed changes

JCamyre added the In Review Task is currently being reviewed label Jun 2, 2025

alanzhang25 added 3 commits June 1, 2025 22:57

Trace Datasets

96f7233

comment fix

31d3a54

python class fix

669895b

alanzhang25 force-pushed the az-trace-datasets branch from 6f80dc1 to 669895b Compare June 2, 2025 06:09

Merge branch 'staging' into az-trace-datasets

7db30d1

adivate2021 removed the In Review Task is currently being reviewed label Jun 3, 2025

adivate2021 approved these changes Jun 3, 2025

View reviewed changes

adivate2021 merged commit c81cd95 into staging Jun 3, 2025
4 of 5 checks passed

adivate2021 deleted the az-trace-datasets branch June 3, 2025 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trace Datasets #291

Trace Datasets #291

Uh oh!

alanzhang25 commented May 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot May 30, 2025

Uh oh!

adivate2021 May 30, 2025

Uh oh!

gemini-code-assist bot May 30, 2025

Uh oh!

adivate2021 May 30, 2025

Uh oh!

adivate2021 left a comment

Uh oh!

adivate2021 May 30, 2025

Uh oh!

adivate2021 May 30, 2025

Uh oh!

adivate2021 left a comment

Uh oh!

Uh oh!

Uh oh!

Trace Datasets #291

Trace Datasets #291

Uh oh!

Conversation

alanzhang25 commented May 30, 2025

📝 Summary

🎯 Purpose

🎥 Demo of Changes

🧪 Testing

✅ Checklist

📌 Linear Issue

✏️ Additional Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot May 30, 2025

Choose a reason for hiding this comment

Uh oh!

adivate2021 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 30, 2025

Choose a reason for hiding this comment

Uh oh!

adivate2021 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

adivate2021 left a comment

Choose a reason for hiding this comment

Uh oh!

adivate2021 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

adivate2021 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

adivate2021 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!