Skip to content

Pydantic Typing #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 20, 2025
Merged

Pydantic Typing #268

merged 2 commits into from
May 20, 2025

Conversation

alanzhang25
Copy link
Collaborator

πŸ“ Summary

Pydantic Typing for Expected Tools

🎯 Purpose

πŸŽ₯ Demo of Changes

πŸ§ͺ Testing

βœ… Checklist

  • Self-review
  • Video demo of changes
  • Unit Tests and CI/CD tests are passing
  • Reviewers assigned

πŸ“Œ Linear Issue

✏️ Additional Notes

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @alanzhang25, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here with a summary of this pull request titled "Pydantic Typing". The primary goal of this PR is to introduce stronger typing and data validation for the concept of 'expected tools' within the judgeval library. This is achieved by defining a dedicated Pydantic model for Tool objects and updating existing data structures (Example and TraceSpan) to use this new model instead of generic dictionaries. This change enhances type safety, improves code clarity, and leverages Pydantic's validation capabilities for tool data.

Highlights

  • New Pydantic Model for Tools: A new Pydantic model, Tool, is introduced in src/judgeval/data/tool.py to represent expected tools with defined fields (tool_name, parameters) and basic validation.
  • Type Safety for Expected Tools: The expected_tools field in both the Example model (src/judgeval/data/example.py) and the TraceSpan model (src/judgeval/data/trace.py) is updated to use the new List[Tool] type hint, replacing the less specific List[Dict[str, Any]].
  • Test Adjustment: A minor adjustment is made in src/demo/sequence_test.py to remove the explicit threshold=0.5 argument when instantiating ToolOrderScorer, likely aligning with a default value or change related to the new Tool structure.

Changelog

Click here to see the changelog
  • src/demo/sequence_test.py
    • Removed the explicit threshold=0.5 argument from the ToolOrderScorer instantiation (line 162).
  • src/judgeval/data/example.py
    • Imported the new Tool model (line 11).
    • Updated the type hint for the expected_tools field from Optional[List[Dict[str, Any]]] to Optional[List[Tool]] (line 35).
  • src/judgeval/data/tool.py
    • Added a new file defining the Tool Pydantic model.
    • The Tool model includes tool_name (str) and optional parameters (Dict[str, Any]).
    • Added field_validators for tool_name (warns if empty) and parameters (warns if not a dict if present).
  • src/judgeval/data/trace.py
    • Imported the new Tool model (line 4).
    • Updated the type hint for the expected_tools field from Optional[List[Dict[str, Any]]] to Optional[List[Tool]] (line 21).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Types now stand so clear,
No more guessing, banish fear.
Pydantic makes it right.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Pydantic typing for expected_tools by defining a new Tool model. This is a positive change for type safety and code clarity. The core changes in example.py and trace.py correctly adopt this new Tool type.

I've identified a few areas for discussion, primarily concerning a potentially impactful change in a test file and the validation logic within the new Tool model. Adherence to PEP 8 is generally good in the new code.

Summary of Findings

  • Test Behavior Change: A change in src/demo/sequence_test.py modifies the threshold for ToolOrderScorer from 0.5 to its default of 1.0, potentially making the test stricter. This needs confirmation if intentional.
  • Data Validation in Tool Model: Validators in the new Tool model in src/judgeval/data/tool.py use warnings.warn for potentially invalid states (e.g., empty tool_name). Raising ValueError might be more appropriate for ensuring data integrity.
  • Validator Effectiveness: The validate_parameters method in src/judgeval/data/tool.py might be redundant or ineffective in its current form due to Pydantic's default validator execution mode.
  • Code Style and Documentation (Low Severity - Not Commented): The new file src/judgeval/data/tool.py is missing a newline at the end of the file and lacks docstrings for the class and its methods. These are minor points and were not commented on due to review settings.
  • PR Metadata (Low Severity - Not Commented): The pull request description is largely a template and could be filled out with more specific details about the changes, purpose, and testing. This was not commented on due to review settings.

Merge Readiness

This pull request makes good progress in introducing stricter typing for expected_tools. However, there are a couple of high severity issues identified: one related to a potentially unintended change in test scoring behavior, and another concerning data validation within the new Tool model that should ideally enforce stricter rules (e.g., non-empty tool names).

I recommend addressing these high severity issues before merging. The medium severity issue regarding validator effectiveness should also be considered for clarity and robustness. As a reviewer, I am not authorized to approve pull requests, so please ensure further review and approval from other team members after addressing the feedback. I am setting the status to REQUEST_CHANGES.

@@ -159,7 +159,7 @@ def generate_itinerary(destination, start_date, end_date):
judgment.assert_test(
project_name="travel_agent_demo",
examples=[example],
scorers=[ToolOrderScorer(threshold=0.5)],
scorers=[ToolOrderScorer()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

It seems the threshold parameter was removed from the ToolOrderScorer instantiation. The ToolOrderScorer class (defined in src/judgeval/scorers/judgeval_scorers/api_scorers/tool_order.py) has a default threshold of 1.0. Previously, it was explicitly set to 0.5 here.

This change means the scorer will now be stricter (requiring a perfect score of 1.0 by default, instead of 0.5).

Could you clarify if this change in scoring behavior for the test is intentional? If this test was designed to pass or measure behavior at a 0.5 threshold, changing it to 1.0 might alter the test's purpose or outcome.

Comment on lines +9 to +13
@field_validator('tool_name')
def validate_tool_name(cls, v):
if not v:
warnings.warn("Tool name is empty or None", UserWarning)
return v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The validate_tool_name validator currently issues a UserWarning if the tool_name is empty or None.

Is an empty or missing tool_name a permissible state for a Tool object? If a tool_name is essential for the Tool to be valid and usable, it might be more appropriate to raise a ValueError here. This would prevent the creation of Tool objects with invalid names and provide a clearer error signal to the developer.

What are your thoughts on making this validation stricter?

Suggested change
@field_validator('tool_name')
def validate_tool_name(cls, v):
if not v:
warnings.warn("Tool name is empty or None", UserWarning)
return v
@field_validator('tool_name')
def validate_tool_name(cls, v):
if not v:
raise ValueError("Tool name cannot be empty or None")
return v

Comment on lines +15 to +19
@field_validator('parameters')
def validate_parameters(cls, v):
if v is not None and not isinstance(v, dict):
warnings.warn(f"Parameters should be a dictionary, got {type(v)}", UserWarning)
return v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This validator checks if parameters (aliased as v here) is not a dictionary when it's not None, and issues a warning.

Since Pydantic field validators default to mode='after', this validator runs after Pydantic has attempted its own type validation/coercion for the parameters field (which is typed as Optional[Dict[str, Any]]).

If a non-dictionary value (that Pydantic cannot coerce to a dict) is provided for parameters, Pydantic itself would typically raise a ValidationError before this custom validator even runs. For example, Tool(tool_name="test", parameters="not-a-dict") would likely fail Pydantic's initial parsing.

Therefore, the condition v is not None and not isinstance(v, dict) might rarely (if ever) be true when this validator executes in 'after' mode, as Pydantic would have already ensured v is a dict or raised an error.

Could you clarify the intended behavior here?
If the goal is to provide a custom warning before Pydantic's error, mode='before' might be needed. Otherwise, this specific check for isinstance(v, dict) might be redundant with Pydantic's built-in type enforcement.

Copy link
Contributor

@adivate2021 adivate2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adivate2021 adivate2021 merged commit 7fc322f into main May 20, 2025
3 checks passed
@adivate2021 adivate2021 deleted the az-tool-pydantic branch May 20, 2025 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants