-
Notifications
You must be signed in to change notification settings - Fork 82
Pydantic Typing #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic Typing #268
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @alanzhang25, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, Gemini here with a summary of this pull request titled "Pydantic Typing". The primary goal of this PR is to introduce stronger typing and data validation for the concept of 'expected tools' within the judgeval
library. This is achieved by defining a dedicated Pydantic model for Tool
objects and updating existing data structures (Example
and TraceSpan
) to use this new model instead of generic dictionaries. This change enhances type safety, improves code clarity, and leverages Pydantic's validation capabilities for tool data.
Highlights
- New Pydantic Model for Tools: A new Pydantic model,
Tool
, is introduced insrc/judgeval/data/tool.py
to represent expected tools with defined fields (tool_name
,parameters
) and basic validation. - Type Safety for Expected Tools: The
expected_tools
field in both theExample
model (src/judgeval/data/example.py
) and theTraceSpan
model (src/judgeval/data/trace.py
) is updated to use the newList[Tool]
type hint, replacing the less specificList[Dict[str, Any]]
. - Test Adjustment: A minor adjustment is made in
src/demo/sequence_test.py
to remove the explicitthreshold=0.5
argument when instantiatingToolOrderScorer
, likely aligning with a default value or change related to the newTool
structure.
Changelog
Click here to see the changelog
- src/demo/sequence_test.py
- Removed the explicit
threshold=0.5
argument from theToolOrderScorer
instantiation (line 162).
- Removed the explicit
- src/judgeval/data/example.py
- Imported the new
Tool
model (line 11). - Updated the type hint for the
expected_tools
field fromOptional[List[Dict[str, Any]]]
toOptional[List[Tool]]
(line 35).
- Imported the new
- src/judgeval/data/tool.py
- Added a new file defining the
Tool
Pydantic model. - The
Tool
model includestool_name
(str) and optionalparameters
(Dict[str, Any]). - Added
field_validator
s fortool_name
(warns if empty) andparameters
(warns if not a dict if present).
- Added a new file defining the
- src/judgeval/data/trace.py
- Imported the new
Tool
model (line 4). - Updated the type hint for the
expected_tools
field fromOptional[List[Dict[str, Any]]]
toOptional[List[Tool]]
(line 21).
- Imported the new
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with π and π on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Types now stand so clear,
No more guessing, banish fear.
Pydantic makes it right.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. β©
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces Pydantic typing for expected_tools
by defining a new Tool
model. This is a positive change for type safety and code clarity. The core changes in example.py
and trace.py
correctly adopt this new Tool
type.
I've identified a few areas for discussion, primarily concerning a potentially impactful change in a test file and the validation logic within the new Tool
model. Adherence to PEP 8 is generally good in the new code.
Summary of Findings
- Test Behavior Change: A change in
src/demo/sequence_test.py
modifies the threshold forToolOrderScorer
from 0.5 to its default of 1.0, potentially making the test stricter. This needs confirmation if intentional. - Data Validation in
Tool
Model: Validators in the newTool
model insrc/judgeval/data/tool.py
usewarnings.warn
for potentially invalid states (e.g., emptytool_name
). RaisingValueError
might be more appropriate for ensuring data integrity. - Validator Effectiveness: The
validate_parameters
method insrc/judgeval/data/tool.py
might be redundant or ineffective in its current form due to Pydantic's default validator execution mode. - Code Style and Documentation (Low Severity - Not Commented): The new file
src/judgeval/data/tool.py
is missing a newline at the end of the file and lacks docstrings for the class and its methods. These are minor points and were not commented on due to review settings. - PR Metadata (Low Severity - Not Commented): The pull request description is largely a template and could be filled out with more specific details about the changes, purpose, and testing. This was not commented on due to review settings.
Merge Readiness
This pull request makes good progress in introducing stricter typing for expected_tools
. However, there are a couple of high
severity issues identified: one related to a potentially unintended change in test scoring behavior, and another concerning data validation within the new Tool
model that should ideally enforce stricter rules (e.g., non-empty tool names).
I recommend addressing these high
severity issues before merging. The medium
severity issue regarding validator effectiveness should also be considered for clarity and robustness. As a reviewer, I am not authorized to approve pull requests, so please ensure further review and approval from other team members after addressing the feedback. I am setting the status to REQUEST_CHANGES
.
@@ -159,7 +159,7 @@ def generate_itinerary(destination, start_date, end_date): | |||
judgment.assert_test( | |||
project_name="travel_agent_demo", | |||
examples=[example], | |||
scorers=[ToolOrderScorer(threshold=0.5)], | |||
scorers=[ToolOrderScorer()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the threshold
parameter was removed from the ToolOrderScorer
instantiation. The ToolOrderScorer
class (defined in src/judgeval/scorers/judgeval_scorers/api_scorers/tool_order.py
) has a default threshold
of 1.0
. Previously, it was explicitly set to 0.5
here.
This change means the scorer will now be stricter (requiring a perfect score of 1.0 by default, instead of 0.5).
Could you clarify if this change in scoring behavior for the test is intentional? If this test was designed to pass or measure behavior at a 0.5 threshold, changing it to 1.0 might alter the test's purpose or outcome.
@field_validator('tool_name') | ||
def validate_tool_name(cls, v): | ||
if not v: | ||
warnings.warn("Tool name is empty or None", UserWarning) | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validate_tool_name
validator currently issues a UserWarning
if the tool_name
is empty or None
.
Is an empty or missing tool_name
a permissible state for a Tool
object? If a tool_name
is essential for the Tool
to be valid and usable, it might be more appropriate to raise a ValueError
here. This would prevent the creation of Tool
objects with invalid names and provide a clearer error signal to the developer.
What are your thoughts on making this validation stricter?
@field_validator('tool_name') | |
def validate_tool_name(cls, v): | |
if not v: | |
warnings.warn("Tool name is empty or None", UserWarning) | |
return v | |
@field_validator('tool_name') | |
def validate_tool_name(cls, v): | |
if not v: | |
raise ValueError("Tool name cannot be empty or None") | |
return v |
@field_validator('parameters') | ||
def validate_parameters(cls, v): | ||
if v is not None and not isinstance(v, dict): | ||
warnings.warn(f"Parameters should be a dictionary, got {type(v)}", UserWarning) | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validator checks if parameters
(aliased as v
here) is not a dictionary when it's not None
, and issues a warning.
Since Pydantic field validators default to mode='after'
, this validator runs after Pydantic has attempted its own type validation/coercion for the parameters
field (which is typed as Optional[Dict[str, Any]]
).
If a non-dictionary value (that Pydantic cannot coerce to a dict) is provided for parameters
, Pydantic itself would typically raise a ValidationError
before this custom validator even runs. For example, Tool(tool_name="test", parameters="not-a-dict")
would likely fail Pydantic's initial parsing.
Therefore, the condition v is not None and not isinstance(v, dict)
might rarely (if ever) be true when this validator executes in 'after'
mode, as Pydantic would have already ensured v
is a dict
or raised an error.
Could you clarify the intended behavior here?
If the goal is to provide a custom warning before Pydantic's error, mode='before'
might be needed. Otherwise, this specific check for isinstance(v, dict)
might be redundant with Pydantic's built-in type enforcement.
bea2e41
to
7e4d4cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
π Summary
Pydantic Typing for Expected Tools
π― Purpose
π₯ Demo of Changes
π§ͺ Testing
β Checklist
π Linear Issue
βοΈ Additional Notes