Skip to content

Ahh/new error format (#273) #288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2025
Merged

Ahh/new error format (#273) #288

merged 1 commit into from
May 30, 2025

Conversation

abhishekg999
Copy link
Contributor

  • feat: start of new record_error

  • feat: error property on TraceSpan

  • chore(docs): update observe also records Error

  • chore: cleanup capture exception impl

  • chore: better comment on _serialize_value


📝 Summary

🎯 Purpose

🎥 Demo of Changes

🧪 Testing

✅ Checklist

  • Self-review
  • Video demo of changes
  • Unit Tests and CI/CD tests are passing
  • Reviewers assigned

📌 Linear Issue

✏️ Additional Notes

* feat: start of new record_error

* feat: error property on TraceSpan

* chore(docs): update observe also records Error

* chore: cleanup capture exception impl

* chore: better comment on _serialize_value

---------

Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @abhishekg999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, titled "Ahh/new error format", introduces a dedicated mechanism for capturing and storing error information within trace spans. The primary goal is to move away from embedding error details within the span's output and instead provide a structured error field. This involves adding the new field to the TraceSpan data model, implementing methods to record errors on spans, and modifying the tracing decorators (@observe) to catch exceptions and record them using the new format. Documentation is also updated to reflect that errors are now captured.

Highlights

  • Dedicated Error Field: A new optional error field (dictionary) has been added to the TraceSpan data model (src/judgeval/data/trace.py, line 17) to store structured error information, separating it from the function's output.
  • Record Error Method: A record_error method is added to the TraceClient class (src/judgeval/common/tracer.py, line 501) allowing errors to be explicitly associated with the current active span.
  • Exception Capturing Logic: A new helper function _capture_exception_for_trace (src/judgeval/common/tracer.py, line 680) is introduced to format Python exception details (type, message, traceback) into the dictionary structure expected by the new error field. The tracing decorators (async_wrapper and wrapper in src/judgeval/common/tracer.py, lines 1157, 1178, 1224, 1246) are updated to catch exceptions and call this helper, recording the error on the span before re-raising the exception. The sys.settrace handler (_trace in src/judgeval/common/tracer.py, line 871) is also updated to use this helper and now explicitly ignores common generator-related exceptions (StopIteration, StopAsyncIteration, GeneratorExit).
  • Serialization Refactor: The serialization logic within TraceSpan (src/judgeval/data/trace.py, line 74) is refactored. The separate _serialize_inputs and _serialize_output methods are removed, replaced by a single _serialize_value method that handles serialization for inputs, output, and the new error field, supporting Pydantic models and gracefully handling non-serializable types.
  • Documentation Update: The documentation for the @observe decorator (docs/monitoring/tracing.mdx, line 90) is updated to explicitly state that it captures errors in addition to input and output.

Changelog

Click here to see the changelog
  • docs/monitoring/tracing.mdx
    • Updated @observe documentation to mention error capture.
  • src/judgeval/common/tracer.py
    • Added record_error method to TraceClient.
    • Added _capture_exception_for_trace helper for formatting exceptions.
    • Modified _trace handler to use _capture_exception_for_trace and ignore generator exceptions.
    • Wrapped function calls in async_wrapper and wrapper with try/except blocks to capture and record exceptions using the new mechanism.
  • src/judgeval/data/trace.py
    • Added error field to TraceSpan model.
    • Refactored serialization logic: removed _serialize_inputs and _serialize_output, added _serialize_value.
    • Updated model_dump to use _serialize_value for inputs, output, and error.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Code runs, then it breaks,
A traceback error makes,
Now spans hold the fault,
No longer in output vault,
For easier debug stakes.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@abhishekg999
Copy link
Contributor Author

This PR was merged into main into staging, backmerging to update staging

@abhishekg999 abhishekg999 enabled auto-merge (squash) May 29, 2025 02:55
@abhishekg999 abhishekg999 requested a review from JCamyre May 29, 2025 02:56
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new error recording mechanism, which is a valuable addition for tracing and debugging. Key changes include adding an error field to TraceSpan, a new record_error method, and a helper function _capture_exception_for_trace for consistent error formatting. The exception handling in the @observe decorator and _DeepTracer has also been improved.

Overall, the changes are well-implemented and enhance the robustness and utility of the tracing system. The refactoring of serialization logic into a unified _serialize_value method is a notable improvement for code clarity and maintainability. The code generally adheres to PEP 8 style guidelines.

I have one suggestion for improving type clarity, detailed in the comments.

Summary of Findings

  • Type Clarity for TraceSpan.error: Suggested using TypedDict for the TraceSpan.error field to provide a more specific type definition than Optional[Dict[str, Any]], enhancing type safety and clarity for the structured error information. This was provided as a review comment.
  • Comment Clarity in _DeepTracer._trace: In src/judgeval/common/tracer.py, line 875 (RIGHT), the comment // exception handling will take priority. within the return event handling block could be slightly confusing as the exception event would typically handle exceptions. This is a minor clarity point and was not commented on due to review settings (low severity).

Merge Readiness

The pull request is in good shape and introduces valuable improvements to error handling in the tracing system. The code is clear and the refactoring efforts are commendable. There is one medium-severity suggestion regarding the use of TypedDict for the error field in TraceSpan to enhance type safety, which I recommend considering.

I am not authorized to approve pull requests. However, from my review, after addressing or considering the provided feedback, this PR should be ready for merging pending further reviews and approvals from the team.

@@ -14,6 +14,7 @@ class TraceSpan(BaseModel):
parent_span_id: Optional[str] = None
span_type: Optional[str] = "span"
inputs: Optional[Dict[str, Any]] = None
error: Optional[Dict[str, Any]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error field is currently typed as Optional[Dict[str, Any]]. While flexible, this doesn't fully convey the expected structure of the error dictionary ({'type': str, 'message': str, 'traceback': List[str]}).

Consider using a TypedDict to define the structure of the formatted exception. This would improve type safety and make it clearer to consumers of the trace data what to expect in the error field. What are your thoughts on this approach for better type definition?

from typing import TypedDict, List, Optional, Dict, Any # Ensure TypedDict is imported

class FormattedException(TypedDict):
    type: str
    message: str
    traceback: List[str]

class TraceSpan(BaseModel):
    span_id: str
    trace_id: str
    function: Optional[str] = None
    depth: int
    created_at: Optional[Any] = None
    parent_span_id: Optional[str] = None
    span_type: Optional[str] = "span"
    inputs: Optional[Dict[str, Any]] = None
    error: Optional[FormattedException] = None # Use TypedDict here
    output: Optional[Any] = None
    duration: Optional[float] = None
    annotation: Optional[List[Dict[str, Any]]] = None
    evaluation_runs: Optional[List[EvaluationRun]] = []
    expected_tools: Optional[List[Tool]] = None
    additional_metadata: Optional[Dict[str, Any]] = None

@justinsheu justinsheu merged commit fa1bfe3 into staging May 30, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants