Pydantic Typing #268

alanzhang25 · 2025-05-20T21:22:52Z

📝 Summary

Pydantic Typing for Expected Tools

🎯 Purpose

🎥 Demo of Changes

🧪 Testing

✅ Checklist

Self-review
Video demo of changes
Unit Tests and CI/CD tests are passing
Reviewers assigned

📌 Linear Issue

✏️ Additional Notes

gemini-code-assist

Hello @alanzhang25, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here with a summary of this pull request titled "Pydantic Typing". The primary goal of this PR is to introduce stronger typing and data validation for the concept of 'expected tools' within the judgeval library. This is achieved by defining a dedicated Pydantic model for Tool objects and updating existing data structures (Example and TraceSpan) to use this new model instead of generic dictionaries. This change enhances type safety, improves code clarity, and leverages Pydantic's validation capabilities for tool data.

Highlights

New Pydantic Model for Tools: A new Pydantic model, Tool, is introduced in src/judgeval/data/tool.py to represent expected tools with defined fields (tool_name, parameters) and basic validation.
Type Safety for Expected Tools: The expected_tools field in both the Example model (src/judgeval/data/example.py) and the TraceSpan model (src/judgeval/data/trace.py) is updated to use the new List[Tool] type hint, replacing the less specific List[Dict[str, Any]].
Test Adjustment: A minor adjustment is made in src/demo/sequence_test.py to remove the explicit threshold=0.5 argument when instantiating ToolOrderScorer, likely aligning with a default value or change related to the new Tool structure.

Changelog

Click here to see the changelog

src/demo/sequence_test.py
- Removed the explicit threshold=0.5 argument from the ToolOrderScorer instantiation (line 162).
src/judgeval/data/example.py
- Imported the new Tool model (line 11).
- Updated the type hint for the expected_tools field from Optional[List[Dict[str, Any]]] to Optional[List[Tool]] (line 35).
src/judgeval/data/tool.py
- Added a new file defining the Tool Pydantic model.
- The Tool model includes tool_name (str) and optional parameters (Dict[str, Any]).
- Added field_validators for tool_name (warns if empty) and parameters (warns if not a dict if present).
src/judgeval/data/trace.py
- Imported the new Tool model (line 4).
- Updated the type hint for the expected_tools field from Optional[List[Dict[str, Any]]] to Optional[List[Tool]] (line 21).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Types now stand so clear,
No more guessing, banish fear.
Pydantic makes it right.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces Pydantic typing for expected_tools by defining a new Tool model. This is a positive change for type safety and code clarity. The core changes in example.py and trace.py correctly adopt this new Tool type.

I've identified a few areas for discussion, primarily concerning a potentially impactful change in a test file and the validation logic within the new Tool model. Adherence to PEP 8 is generally good in the new code.

Summary of Findings

Test Behavior Change: A change in src/demo/sequence_test.py modifies the threshold for ToolOrderScorer from 0.5 to its default of 1.0, potentially making the test stricter. This needs confirmation if intentional.
Data Validation in Tool Model: Validators in the new Tool model in src/judgeval/data/tool.py use warnings.warn for potentially invalid states (e.g., empty tool_name). Raising ValueError might be more appropriate for ensuring data integrity.
Validator Effectiveness: The validate_parameters method in src/judgeval/data/tool.py might be redundant or ineffective in its current form due to Pydantic's default validator execution mode.
Code Style and Documentation (Low Severity - Not Commented): The new file src/judgeval/data/tool.py is missing a newline at the end of the file and lacks docstrings for the class and its methods. These are minor points and were not commented on due to review settings.
PR Metadata (Low Severity - Not Commented): The pull request description is largely a template and could be filled out with more specific details about the changes, purpose, and testing. This was not commented on due to review settings.

Merge Readiness

This pull request makes good progress in introducing stricter typing for expected_tools. However, there are a couple of high severity issues identified: one related to a potentially unintended change in test scoring behavior, and another concerning data validation within the new Tool model that should ideally enforce stricter rules (e.g., non-empty tool names).

I recommend addressing these high severity issues before merging. The medium severity issue regarding validator effectiveness should also be considered for clarity and robustness. As a reviewer, I am not authorized to approve pull requests, so please ensure further review and approval from other team members after addressing the feedback. I am setting the status to REQUEST_CHANGES.

gemini-code-assist · 2025-05-20T21:24:48Z

src/demo/sequence_test.py

@@ -159,7 +159,7 @@ def generate_itinerary(destination, start_date, end_date):
    judgment.assert_test(
        project_name="travel_agent_demo",
        examples=[example],
-        scorers=[ToolOrderScorer(threshold=0.5)],
+        scorers=[ToolOrderScorer()],


It seems the threshold parameter was removed from the ToolOrderScorer instantiation. The ToolOrderScorer class (defined in src/judgeval/scorers/judgeval_scorers/api_scorers/tool_order.py) has a default threshold of 1.0. Previously, it was explicitly set to 0.5 here.

This change means the scorer will now be stricter (requiring a perfect score of 1.0 by default, instead of 0.5).

Could you clarify if this change in scoring behavior for the test is intentional? If this test was designed to pass or measure behavior at a 0.5 threshold, changing it to 1.0 might alter the test's purpose or outcome.

gemini-code-assist · 2025-05-20T21:24:48Z

src/judgeval/data/tool.py

+    @field_validator('tool_name')
+    def validate_tool_name(cls, v):
+        if not v:
+            warnings.warn("Tool name is empty or None", UserWarning)
+        return v


The validate_tool_name validator currently issues a UserWarning if the tool_name is empty or None.

Is an empty or missing tool_name a permissible state for a Tool object? If a tool_name is essential for the Tool to be valid and usable, it might be more appropriate to raise a ValueError here. This would prevent the creation of Tool objects with invalid names and provide a clearer error signal to the developer.

What are your thoughts on making this validation stricter?

Suggested change

@field_validator('tool_name')

def validate_tool_name(cls, v):

if not v:

warnings.warn("Tool name is empty or None", UserWarning)

return v

@field_validator('tool_name')

def validate_tool_name(cls, v):

if not v:

raise ValueError("Tool name cannot be empty or None")

return v

gemini-code-assist · 2025-05-20T21:24:48Z

src/judgeval/data/tool.py

+    @field_validator('parameters')
+    def validate_parameters(cls, v):
+        if v is not None and not isinstance(v, dict):
+            warnings.warn(f"Parameters should be a dictionary, got {type(v)}", UserWarning)
+        return v


This validator checks if parameters (aliased as v here) is not a dictionary when it's not None, and issues a warning.

Since Pydantic field validators default to mode='after', this validator runs after Pydantic has attempted its own type validation/coercion for the parameters field (which is typed as Optional[Dict[str, Any]]).

If a non-dictionary value (that Pydantic cannot coerce to a dict) is provided for parameters, Pydantic itself would typically raise a ValidationError before this custom validator even runs. For example, Tool(tool_name="test", parameters="not-a-dict") would likely fail Pydantic's initial parsing.

Therefore, the condition v is not None and not isinstance(v, dict) might rarely (if ever) be true when this validator executes in 'after' mode, as Pydantic would have already ensured v is a dict or raised an error.

Could you clarify the intended behavior here?
If the goal is to provide a custom warning before Pydantic's error, mode='before' might be needed. Otherwise, this specific check for isinstance(v, dict) might be redundant with Pydantic's built-in type enforcement.

adivate2021

LGTM

Pydantic Typing

b3672c7

gemini-code-assist bot reviewed May 20, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 20, 2025

View reviewed changes

UT fix

7e4d4cf

alanzhang25 force-pushed the az-tool-pydantic branch from bea2e41 to 7e4d4cf Compare May 20, 2025 23:06

adivate2021 approved these changes May 20, 2025

View reviewed changes

adivate2021 merged commit 7fc322f into main May 20, 2025
3 checks passed

adivate2021 deleted the az-tool-pydantic branch May 20, 2025 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pydantic Typing #268

Pydantic Typing #268

Uh oh!

alanzhang25 commented May 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot May 20, 2025

Uh oh!

gemini-code-assist bot May 20, 2025

Uh oh!

gemini-code-assist bot May 20, 2025

Uh oh!

adivate2021 left a comment

Uh oh!

Uh oh!

Uh oh!

Pydantic Typing #268

Pydantic Typing #268

Uh oh!

Conversation

alanzhang25 commented May 20, 2025

📝 Summary

🎯 Purpose

🎥 Demo of Changes

🧪 Testing

✅ Checklist

📌 Linear Issue

✏️ Additional Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot May 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 20, 2025

Choose a reason for hiding this comment

Uh oh!

adivate2021 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!