-
Notifications
You must be signed in to change notification settings - Fork 82
Trace Datasets #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace Datasets #291
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @alanzhang25, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here with a summary of this pull request titled "Trace Datasets".
The primary intent of this PR appears to be integrating trace data directly into the EvalDataset
structure and ensuring this data can be persisted and retrieved via the dataset client. A significant part of the changes involves renaming the entries
field within the Trace
model and related code to trace_spans
for better clarity, reflecting that these entries represent spans within a trace.
This change allows datasets to not only contain evaluation examples but also the associated trace information, which is crucial for understanding the execution flow and context during evaluations.
Highlights
- Data Model Update: The
Trace
data model has been updated by renaming theentries
field totrace_spans
to more accurately describe its content. - Dataset Enhancement: The
EvalDataset
class now includes a list ofTrace
objects, allowing datasets to store trace data alongside evaluation examples. - Client Integration: The
EvalDatasetClient
has been modified to handle the pushing and pulling of trace data as part of the dataset payload to and from the backend service. - Code Consistency: References to the renamed field (
entries
->trace_spans
) have been updated across various parts of the codebase, including end-to-end tests, unit tests, and evaluation run logic.
Changelog
Click here to see the changelog
- src/e2etests/test_judgee_traces_update.py
- Updated test data dictionaries to use
trace_spans
instead ofentries
when simulating trace save requests.
- Updated test data dictionaries to use
- src/e2etests/test_tracer.py
- Removed a helper function
print_trace_hierarchy
.
- Removed a helper function
- src/judgeval/common/tracer.py
- Updated the
save
method to usetrace_spans
when preparing the trace data for saving.
- Updated the
- src/judgeval/data/datasets/dataset.py
- Imported the
Trace
model. - Added a
traces: List[Trace]
field to theEvalDataset
dataclass. - Added
traces
parameter to the__init__
method. - Added an
add_trace
method to appendTrace
objects. - Included
traces
in the__str__
representation.
- Imported the
- src/judgeval/data/datasets/eval_dataset_client.py
- Imported the
Trace
model. - Modified the
push
method to include the list of traces in the API request payload. - Modified the
pull
method to deserialize trace data from the API response and assign it to the dataset. - Added a
print(payload)
statement in thepull
method (likely for debugging).
- Imported the
- src/judgeval/data/trace.py
- Renamed the
entries
field totrace_spans
in theTrace
Pydantic model.
- Renamed the
- src/judgeval/run_evaluation.py
- Updated the reference from
trace.entries[0]
totrace.trace_spans[0]
when setting expected tools during trace evaluation runs.
- Updated the reference from
- src/tests/common/test_tracer.py
- Updated a docstring to reflect the change from
entries
totrace_spans
.
- Updated a docstring to reflect the change from
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with π and π on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A list of spans,
Not entries, you see.
A dataset grows,
With trace history.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. β©
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request primarily focuses on a consistent renaming of the entries
field to trace_spans
across the codebase, which improves clarity. Additionally, it introduces the capability for EvalDataset
to manage a list of Trace
objects, which is a useful enhancement.
The changes are generally well-implemented. I've identified a couple of areas for improvement, mainly concerning a debug print statement and a list manipulation pattern that could be more efficient.
Please also consider filling out the pull request description template (Summary, Purpose, Testing, etc.) to provide more context for reviewers and future reference.
Summary of Findings
- List concatenation in
add_trace
: Insrc/judgeval/data/datasets/dataset.py
, theadd_trace
method usesself.traces = self.traces + [t]
. Usingself.traces.append(t)
would be more idiomatic and efficient, especially when adding many traces. - Debug print statement: A
print(payload)
statement was found insrc/judgeval/data/datasets/eval_dataset_client.py
within thepull
method. This should be removed. - Debug log enhancement (low severity - not commented): In
src/judgeval/data/datasets/dataset.py
, the__init__
method's debug logdebug(f"Initializing EvalDataset with {len(examples)} examples")
could be updated to also mention the number of traces being initialized, e.g.,debug(f"Initializing EvalDataset with {len(examples)} examples and {len(traces)} traces")
. This was not added as a direct comment due to review settings.
Merge Readiness
The pull request introduces valuable changes by standardizing terminology and enhancing EvalDataset
capabilities. However, there are a couple of medium-severity issues identified (an inefficient list append and a leftover debug print statement) that should be addressed before merging. Once these are resolved, the PR should be in good shape. As an AI, I am not authorized to approve pull requests; please ensure other reviewers approve these changes before merging.
@@ -220,6 +223,9 @@ def add_from_yaml(self, file_path: str) -> None: | |||
def add_example(self, e: Example) -> None: | |||
self.examples = self.examples + [e] | |||
# TODO if we need to add rank, then we need to do it here | |||
|
|||
def add_trace(self, t: Trace) -> None: | |||
self.traces = self.traces + [t] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for using list concatenation (self.traces + [t]
) here instead of append()
? Using append()
is generally more efficient for adding elements to a list as it modifies the list in-place rather than creating a new list each time. For a large number of traces, this could have a minor performance impact.
self.traces = self.traces + [t] | |
self.traces.append(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -201,7 +202,9 @@ def pull(self, alias: str, project_name: str) -> EvalDataset: | |||
|
|||
info(f"Successfully pulled dataset with alias '{alias}'") | |||
payload = response.json() | |||
print(payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part, just minor things from Gemini
@@ -220,6 +223,9 @@ def add_from_yaml(self, file_path: str) -> None: | |||
def add_example(self, e: Example) -> None: | |||
self.examples = self.examples + [e] | |||
# TODO if we need to add rank, then we need to do it here | |||
|
|||
def add_trace(self, t: Trace) -> None: | |||
self.traces = self.traces + [t] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -201,7 +202,9 @@ def pull(self, alias: str, project_name: str) -> EvalDataset: | |||
|
|||
info(f"Successfully pulled dataset with alias '{alias}'") | |||
payload = response.json() | |||
print(payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
6f80dc1
to
669895b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Staging and auto package release (#275) * add auto package release * add staging stuff * add test trigger * remove test trigger * timeout + minor error handling * gemini sugggestions * Trace Usage Edits (#266) * update litellm pyproject and fix e2etest (#276) * update litellm pyproject * add e2etest fix * Agent Names added (#270) * Agent Names added * Fixed agent names for deep tracing * Minor fixes * Dummy commit to trigger staging checks * Fix assert test e2e test (#277) * fix assert test e2e test * Update src/judgeval/judgment_client.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * add evaluation link (#279) * add evaluation link * fix run eval * add e2etest for yaml * Image file updates and social media badges (#274) * updated logos and socials badges * Update README.md * Delete package-lock.json --------- Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> * Testing updates (#269) * updates and s3 e2e tests * clear singleton before tests * coverage, fix http request blocker * test trigger * if always block * change working directory * fix * remove test trigger * gemini suggestions * add new trace column (#281) * change timeout to use variable (#282) * add code coverage for staging e2es (#285) * Add ToolDependency Scorer and Parameter Checking (#253) * Sequence to Trace Conversion * Add agent names with a new decorator * Sequence to Trace Conversion * trace save * comment out; * Changes to yaml and agent file * pydantic * Added tool dependency metric * Changed yaml * Add support for multiple assert test calls * Added parameter checking to ToolDependency metric * Agent Names added * Fixed tool dependency scorer with pydantic change * Added internal scorer for parameter checking * Support for action dependencies added * Changed multi agent test to better show case where param checking helps * Added error check for providing no tools when using param checking * Remove unused parameter --------- Co-authored-by: Alan <alanzhang2021@gmail.com> Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> Co-authored-by: JCamyre <jwcamry03@gmail.com> * modified deep tracing to chain with existing tracefuncs (#286) * Refactor agent names (#287) * Refactor agent names * Minor fix * Minor Gemini fix * Minor fixes to e2etest and TraceSpan datatype * Handle un-serializable types in Traces. Filter 'self' arguments. (#278) * Example Created At (#284) * fix logging for assert test trace (#290) * fix logging for assert test trace * fix agent * E2E additions based on assigned judgeval files (#283) * classifierscorer refactor, other e2es + uts additions * removed e2e * small updates * fix uts * remove print * Make default project name uuid (#293) * Raise warning for missing Org ID or API Key (#294) * Raise warning for missing Org ID or API Key * fix tests * fix tests * Agent state (#295) * agent state into tracer * model dump state before and after * revert return * add field mappings * fix feedback * Update Docs link (#297) * Ahh/http error recording (#289) * feat: tracer http errors as separate part of error output * Update src/judgeval/common/tracer.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * chore: cleanup commented code --------- Co-authored-by: Joseph S Camyre <68767176+JCamyre@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update README.md (#298) * Update Docs link * Add updated Discord link and add links to subsections. * add workflow (#300) * Example Check and User Prompt (#301) * Example Check and User Prompt * Test Fixes * Required Params for Custom Scorers * Trace Datasets (#291) * Trace Datasets * comment fix * python class fix --------- Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com> * Update blocked-pr.yaml (#302) * removed src/demo (#299) Co-authored-by: JCamyre <jwcamry03@gmail.com> --------- Co-authored-by: Alan Zhang <97066812+alanzhang25@users.noreply.github.com> Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: shunuen0 <138656153+shunuen0@users.noreply.github.com> Co-authored-by: Alan <alanzhang2021@gmail.com> Co-authored-by: JCamyre <jwcamry03@gmail.com> Co-authored-by: Joseph S Camyre <68767176+JCamyre@users.noreply.github.com> Co-authored-by: rishi763 <55810666+rishi763@users.noreply.github.com> Co-authored-by: Andrew Li <51971573+andrewli2403@users.noreply.github.com> Co-authored-by: Abhishek Govindarasu <abhishekgovindarasu@gmail.com>
π Summary
π― Purpose
π₯ Demo of Changes
π§ͺ Testing
β Checklist
π Linear Issue
βοΈ Additional Notes