Skip to content

Trace Datasets #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 3, 2025
Merged

Trace Datasets #291

merged 4 commits into from
Jun 3, 2025

Conversation

alanzhang25
Copy link
Collaborator

πŸ“ Summary

🎯 Purpose

πŸŽ₯ Demo of Changes

πŸ§ͺ Testing

βœ… Checklist

  • Self-review
  • Video demo of changes
  • Unit Tests and CI/CD tests are passing
  • Reviewers assigned

πŸ“Œ Linear Issue

✏️ Additional Notes

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @alanzhang25, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here with a summary of this pull request titled "Trace Datasets".

The primary intent of this PR appears to be integrating trace data directly into the EvalDataset structure and ensuring this data can be persisted and retrieved via the dataset client. A significant part of the changes involves renaming the entries field within the Trace model and related code to trace_spans for better clarity, reflecting that these entries represent spans within a trace.

This change allows datasets to not only contain evaluation examples but also the associated trace information, which is crucial for understanding the execution flow and context during evaluations.

Highlights

  • Data Model Update: The Trace data model has been updated by renaming the entries field to trace_spans to more accurately describe its content.
  • Dataset Enhancement: The EvalDataset class now includes a list of Trace objects, allowing datasets to store trace data alongside evaluation examples.
  • Client Integration: The EvalDatasetClient has been modified to handle the pushing and pulling of trace data as part of the dataset payload to and from the backend service.
  • Code Consistency: References to the renamed field (entries -> trace_spans) have been updated across various parts of the codebase, including end-to-end tests, unit tests, and evaluation run logic.

Changelog

Click here to see the changelog
  • src/e2etests/test_judgee_traces_update.py
    • Updated test data dictionaries to use trace_spans instead of entries when simulating trace save requests.
  • src/e2etests/test_tracer.py
    • Removed a helper function print_trace_hierarchy.
  • src/judgeval/common/tracer.py
    • Updated the save method to use trace_spans when preparing the trace data for saving.
  • src/judgeval/data/datasets/dataset.py
    • Imported the Trace model.
    • Added a traces: List[Trace] field to the EvalDataset dataclass.
    • Added traces parameter to the __init__ method.
    • Added an add_trace method to append Trace objects.
    • Included traces in the __str__ representation.
  • src/judgeval/data/datasets/eval_dataset_client.py
    • Imported the Trace model.
    • Modified the push method to include the list of traces in the API request payload.
    • Modified the pull method to deserialize trace data from the API response and assign it to the dataset.
    • Added a print(payload) statement in the pull method (likely for debugging).
  • src/judgeval/data/trace.py
    • Renamed the entries field to trace_spans in the Trace Pydantic model.
  • src/judgeval/run_evaluation.py
    • Updated the reference from trace.entries[0] to trace.trace_spans[0] when setting expected tools during trace evaluation runs.
  • src/tests/common/test_tracer.py
    • Updated a docstring to reflect the change from entries to trace_spans.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A list of spans,
Not entries, you see.
A dataset grows,
With trace history.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on a consistent renaming of the entries field to trace_spans across the codebase, which improves clarity. Additionally, it introduces the capability for EvalDataset to manage a list of Trace objects, which is a useful enhancement.

The changes are generally well-implemented. I've identified a couple of areas for improvement, mainly concerning a debug print statement and a list manipulation pattern that could be more efficient.

Please also consider filling out the pull request description template (Summary, Purpose, Testing, etc.) to provide more context for reviewers and future reference.

Summary of Findings

  • List concatenation in add_trace: In src/judgeval/data/datasets/dataset.py, the add_trace method uses self.traces = self.traces + [t]. Using self.traces.append(t) would be more idiomatic and efficient, especially when adding many traces.
  • Debug print statement: A print(payload) statement was found in src/judgeval/data/datasets/eval_dataset_client.py within the pull method. This should be removed.
  • Debug log enhancement (low severity - not commented): In src/judgeval/data/datasets/dataset.py, the __init__ method's debug log debug(f"Initializing EvalDataset with {len(examples)} examples") could be updated to also mention the number of traces being initialized, e.g., debug(f"Initializing EvalDataset with {len(examples)} examples and {len(traces)} traces"). This was not added as a direct comment due to review settings.

Merge Readiness

The pull request introduces valuable changes by standardizing terminology and enhancing EvalDataset capabilities. However, there are a couple of medium-severity issues identified (an inefficient list append and a leftover debug print statement) that should be addressed before merging. Once these are resolved, the PR should be in good shape. As an AI, I am not authorized to approve pull requests; please ensure other reviewers approve these changes before merging.

@@ -220,6 +223,9 @@ def add_from_yaml(self, file_path: str) -> None:
def add_example(self, e: Example) -> None:
self.examples = self.examples + [e]
# TODO if we need to add rank, then we need to do it here

def add_trace(self, t: Trace) -> None:
self.traces = self.traces + [t]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Is there a reason for using list concatenation (self.traces + [t]) here instead of append()? Using append() is generally more efficient for adding elements to a list as it modifies the list in-place rather than creating a new list each time. For a large number of traces, this could have a minor performance impact.

Suggested change
self.traces = self.traces + [t]
self.traces.append(t)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -201,7 +202,9 @@ def pull(self, alias: str, project_name: str) -> EvalDataset:

info(f"Successfully pulled dataset with alias '{alias}'")
payload = response.json()
print(payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It looks like this print(payload) statement was added for debugging. Could we remove this before merging to keep the console output clean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@adivate2021 adivate2021 self-requested a review May 30, 2025 17:58
Copy link
Contributor

@adivate2021 adivate2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part, just minor things from Gemini

@@ -220,6 +223,9 @@ def add_from_yaml(self, file_path: str) -> None:
def add_example(self, e: Example) -> None:
self.examples = self.examples + [e]
# TODO if we need to add rank, then we need to do it here

def add_trace(self, t: Trace) -> None:
self.traces = self.traces + [t]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -201,7 +202,9 @@ def pull(self, alias: str, project_name: str) -> EvalDataset:

info(f"Successfully pulled dataset with alias '{alias}'")
payload = response.json()
print(payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JCamyre JCamyre added the In Review Task is currently being reviewed label Jun 2, 2025
@adivate2021 adivate2021 removed the In Review Task is currently being reviewed label Jun 3, 2025
Copy link
Contributor

@adivate2021 adivate2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adivate2021 adivate2021 merged commit c81cd95 into staging Jun 3, 2025
4 of 5 checks passed
@adivate2021 adivate2021 deleted the az-trace-datasets branch June 3, 2025 21:03
adivate2021 added a commit that referenced this pull request Jun 6, 2025
* Staging and auto package release (#275)

* add auto package release

* add staging stuff

* add test trigger

* remove test trigger

* timeout + minor error handling

* gemini sugggestions

* Trace Usage Edits (#266)

* update litellm pyproject and fix e2etest (#276)

* update litellm pyproject

* add e2etest fix

* Agent Names added (#270)

* Agent Names added

* Fixed agent names for deep tracing

* Minor fixes

* Dummy commit to trigger staging checks

* Fix assert test e2e test (#277)

* fix assert test e2e test

* Update src/judgeval/judgment_client.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* add evaluation link (#279)

* add evaluation link

* fix run eval

* add e2etest for yaml

* Image file updates and social media badges (#274)

* updated logos and socials badges

* Update README.md

* Delete package-lock.json

---------

Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com>

* Testing updates (#269)

* updates and s3 e2e tests

* clear singleton before tests

* coverage, fix http request blocker

* test trigger

* if always block

* change working directory

* fix

* remove test trigger

* gemini suggestions

* add new trace column (#281)

* change timeout to use variable (#282)

* add code coverage for staging e2es (#285)

* Add ToolDependency Scorer and Parameter Checking (#253)

* Sequence to Trace Conversion

* Add agent names with a new decorator

* Sequence to Trace Conversion

* trace save

* comment out;

* Changes to yaml and agent file

* pydantic

* Added tool dependency metric

* Changed yaml

* Add support for multiple assert test calls

* Added parameter checking to ToolDependency metric

* Agent Names added

* Fixed tool dependency scorer with pydantic change

* Added internal scorer for parameter checking

* Support for action dependencies added

* Changed multi agent test to better show case where param checking helps

* Added error check for providing no tools when using param checking

* Remove unused parameter

---------

Co-authored-by: Alan <alanzhang2021@gmail.com>
Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com>
Co-authored-by: JCamyre <jwcamry03@gmail.com>

* modified deep tracing to chain with existing tracefuncs (#286)

* Refactor agent names (#287)

* Refactor agent names

* Minor fix

* Minor Gemini fix

* Minor fixes to e2etest and TraceSpan datatype

* Handle un-serializable types in Traces. Filter 'self' arguments. (#278)

* Example Created At (#284)

* fix logging for assert test trace (#290)

* fix logging for assert test trace

* fix agent

* E2E additions based on assigned judgeval files (#283)

* classifierscorer refactor, other e2es + uts additions

* removed e2e

* small updates

* fix uts

* remove print

* Make default project name uuid (#293)

* Raise warning for missing Org ID or API Key (#294)

* Raise warning for missing Org ID or API Key

* fix tests

* fix tests

* Agent state (#295)

* agent state into tracer

* model dump state before and after

* revert return

* add field mappings

* fix feedback

* Update Docs link (#297)

* Ahh/http error recording (#289)

* feat: tracer http errors as separate part of error output

* Update src/judgeval/common/tracer.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* chore: cleanup commented code

---------

Co-authored-by: Joseph S Camyre <68767176+JCamyre@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update README.md (#298)

* Update Docs link

* Add updated Discord link and add links to subsections.

* add workflow (#300)

* Example Check and User Prompt (#301)

* Example Check and User Prompt

* Test Fixes

* Required Params for Custom Scorers

* Trace Datasets (#291)

* Trace Datasets

* comment fix

* python class fix

---------

Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com>

* Update blocked-pr.yaml (#302)

* removed src/demo (#299)

Co-authored-by: JCamyre <jwcamry03@gmail.com>

---------

Co-authored-by: Alan Zhang <97066812+alanzhang25@users.noreply.github.com>
Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com>
Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: shunuen0 <138656153+shunuen0@users.noreply.github.com>
Co-authored-by: Alan <alanzhang2021@gmail.com>
Co-authored-by: JCamyre <jwcamry03@gmail.com>
Co-authored-by: Joseph S Camyre <68767176+JCamyre@users.noreply.github.com>
Co-authored-by: rishi763 <55810666+rishi763@users.noreply.github.com>
Co-authored-by: Andrew Li <51971573+andrewli2403@users.noreply.github.com>
Co-authored-by: Abhishek Govindarasu <abhishekgovindarasu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants