Skip to content

Pydantic Typing #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 20, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/demo/sequence_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ def generate_itinerary(destination, start_date, end_date):
judgment.assert_test(
project_name="travel_agent_demo",
examples=[example],
scorers=[ToolOrderScorer(threshold=0.5)],
scorers=[ToolOrderScorer()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

It seems the threshold parameter was removed from the ToolOrderScorer instantiation. The ToolOrderScorer class (defined in src/judgeval/scorers/judgeval_scorers/api_scorers/tool_order.py) has a default threshold of 1.0. Previously, it was explicitly set to 0.5 here.

This change means the scorer will now be stricter (requiring a perfect score of 1.0 by default, instead of 0.5).

Could you clarify if this change in scoring behavior for the test is intentional? If this test was designed to pass or measure behavior at a 0.5 threshold, changing it to 1.0 might alter the test's purpose or outcome.

model="gpt-4.1-mini",
function=generate_itinerary,
tracer=tracer,
Expand Down
3 changes: 2 additions & 1 deletion src/judgeval/data/example.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from pydantic import BaseModel, Field, field_validator
from enum import Enum
from datetime import datetime
from judgeval.data.tool import Tool
import time


Expand All @@ -31,7 +32,7 @@ class Example(BaseModel):
retrieval_context: Optional[List[str]] = None
additional_metadata: Optional[Dict[str, Any]] = None
tools_called: Optional[List[str]] = None
expected_tools: Optional[List[Dict[str, Any]]] = None
expected_tools: Optional[List[Tool]] = None
name: Optional[str] = None
example_id: str = Field(default_factory=lambda: str(uuid4()))
example_index: Optional[int] = None
Expand Down
19 changes: 19 additions & 0 deletions src/judgeval/data/tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from pydantic import BaseModel, field_validator
from typing import Dict, Any, Optional
import warnings

class Tool(BaseModel):
tool_name: str
parameters: Optional[Dict[str, Any]] = None

@field_validator('tool_name')
def validate_tool_name(cls, v):
if not v:
warnings.warn("Tool name is empty or None", UserWarning)
return v
Comment on lines +9 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The validate_tool_name validator currently issues a UserWarning if the tool_name is empty or None.

Is an empty or missing tool_name a permissible state for a Tool object? If a tool_name is essential for the Tool to be valid and usable, it might be more appropriate to raise a ValueError here. This would prevent the creation of Tool objects with invalid names and provide a clearer error signal to the developer.

What are your thoughts on making this validation stricter?

Suggested change
@field_validator('tool_name')
def validate_tool_name(cls, v):
if not v:
warnings.warn("Tool name is empty or None", UserWarning)
return v
@field_validator('tool_name')
def validate_tool_name(cls, v):
if not v:
raise ValueError("Tool name cannot be empty or None")
return v


@field_validator('parameters')
def validate_parameters(cls, v):
if v is not None and not isinstance(v, dict):
warnings.warn(f"Parameters should be a dictionary, got {type(v)}", UserWarning)
return v
Comment on lines +15 to +19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This validator checks if parameters (aliased as v here) is not a dictionary when it's not None, and issues a warning.

Since Pydantic field validators default to mode='after', this validator runs after Pydantic has attempted its own type validation/coercion for the parameters field (which is typed as Optional[Dict[str, Any]]).

If a non-dictionary value (that Pydantic cannot coerce to a dict) is provided for parameters, Pydantic itself would typically raise a ValidationError before this custom validator even runs. For example, Tool(tool_name="test", parameters="not-a-dict") would likely fail Pydantic's initial parsing.

Therefore, the condition v is not None and not isinstance(v, dict) might rarely (if ever) be true when this validator executes in 'after' mode, as Pydantic would have already ensured v is a dict or raised an error.

Could you clarify the intended behavior here?
If the goal is to provide a custom warning before Pydantic's error, mode='before' might be needed. Otherwise, this specific check for isinstance(v, dict) might be redundant with Pydantic's built-in type enforcement.

3 changes: 2 additions & 1 deletion src/judgeval/data/trace.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from pydantic import BaseModel
from typing import Optional, Dict, Any, List
from judgeval.evaluation_run import EvaluationRun
from judgeval.data.tool import Tool
import json
from datetime import datetime, timezone

Expand All @@ -17,7 +18,7 @@ class TraceSpan(BaseModel):
duration: Optional[float] = None
annotation: Optional[List[Dict[str, Any]]] = None
evaluation_runs: Optional[List[EvaluationRun]] = []
expected_tools: Optional[List[Dict[str, Any]]] = None
expected_tools: Optional[List[Tool]] = None
additional_metadata: Optional[Dict[str, Any]] = None

def model_dump(self, **kwargs):
Expand Down
Loading