Skip to content

Ahh/new error format #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 26, 2025
Merged

Ahh/new error format #273

merged 6 commits into from
May 26, 2025

Conversation

abhishekg999
Copy link
Contributor

πŸ“ Summary

Adds recording of errors and traceback in deep trace and regular, in the new TraceSpan error field.

🎯 Purpose

πŸŽ₯ Demo of Changes

πŸ§ͺ Testing

βœ… Checklist

  • Self-review
  • Video demo of changes
  • Unit Tests and CI/CD tests are passing
  • Reviewers assigned

πŸ“Œ Linear Issue

✏️ Additional Notes

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @abhishekg999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request to help everyone get oriented. This PR, titled "Ahh/new error format", aims to enhance the tracing system by introducing a dedicated field for recording errors and their tracebacks within trace spans. The primary goal is to capture detailed error information during both regular and deep tracing, for both synchronous and asynchronous function calls, and store it directly within the relevant TraceSpan object rather than potentially mixing it with regular output.

Highlights

  • Dedicated Error Recording Method: A new method record_error(self, error: Any) was added to the tracer class (src/judgeval/common/tracer.py line 502) to easily associate an error dictionary with the current trace span.
  • Error Capture in Tracing Wrappers: The tracing wrappers (async_wrapper and wrapper) for both regular and deep tracing have been modified to include try...except Exception blocks. When an exception occurs, the exception details (type, message, traceback) are captured, formatted into a dictionary, and recorded using the new record_error method before the exception is re-raised (src/judgeval/common/tracer.py lines 1163-1174, 1195-1206, 1252-1263, 1285-1296).
  • Serialization Refactor: The serialization logic within TraceSpan was refactored. The separate _serialize_inputs and _serialize_output methods were removed (src/judgeval/data/trace.py lines 46-68, 97-121) and replaced by a single generic _serialize_value method (src/judgeval/data/trace.py line 74) used for inputs, outputs, and the new error field.

Changelog

  • src/judgeval/common/tracer.py
    • Added record_error method to associate errors with the current span (lines 502-508).
    • Updated _trace method to use record_error instead of record_output for exceptions (lines 886-887).
    • Added try...except blocks in async_wrapper to capture and record exceptions using record_error for both deep and regular async paths (lines 1163-1174, 1195-1206).
    • Added try...except blocks in wrapper to capture and record exceptions using record_error for both deep and regular sync paths (lines 1252-1263, 1285-1296).
  • src/judgeval/data/trace.py
    • Added error: Optional[Dict[str, Any]] = None field to TraceSpan model (line 17).
    • Updated model_dump to include the new error field and use the refactored _serialize_value for inputs, outputs, and errors (lines 31-33).
    • Removed the old _serialize_inputs method (lines 46-68).
    • Renamed _serialize_output to _serialize_value and refactored it to handle any value (lines 74-98).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Code runs, then it breaks,
A traceback error makes,
Now spans hold the pain,
So we can debug again,
No more guessing mistakes.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new error field to the TraceSpan model and updates the tracing mechanism to record exceptions and their tracebacks into this field. This is a valuable addition for debugging and monitoring. The core logic for capturing errors is implemented in several places.

The main areas for improvement are adding documentation for the new record_error method, updating a docstring for a refactored method, and addressing significant code duplication in the exception handling logic within the tracer's wrappers. Consolidating the error formatting and recording into a helper function would greatly enhance maintainability.

Summary of Findings

  • Code Duplication in Exception Handling: The logic for formatting exception details (type, message, traceback) and recording them using current_trace.record_error() is duplicated in four places within the async_wrapper and wrapper methods in src/judgeval/common/tracer.py. Similar logic also exists in _DeepTracer._trace. This should be refactored into a common helper function to improve maintainability.
  • Missing Docstring: The new public method record_error in src/judgeval/common/tracer.py lacks a docstring, which makes it harder to understand its purpose and usage without reading the code.
  • Outdated Docstring: The docstring for the refactored method _serialize_value in src/judgeval/data/trace.py (formerly _serialize_output) still refers to serializing only 'output data', whereas it's now used for inputs, outputs, and errors. The docstring should be updated to reflect its broader utility.

Merge Readiness

The pull request introduces a useful feature for error tracking. However, there's a significant instance of code duplication in the exception handling logic that should be addressed by refactoring it into a helper function. Additionally, some docstrings for new and modified methods need to be added or updated for clarity.

Due to the high-severity code duplication issue, I recommend that these changes be addressed before merging. I am not authorized to approve pull requests, so please ensure other reviewers approve these changes once the feedback is incorporated.

@@ -498,7 +498,15 @@ def record_output(self, output: Any):
return span # Return the created entry
# Removed else block - original didn't have one
return None # Return None if no span_id found


def record_error(self, error: Any):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new record_error method is a good addition for specifically handling errors within a span. Could we add a docstring to explain its purpose, parameters, and what it returns? This would improve code clarity and maintainability for other developers.

Copy link
Collaborator

@yifan1207 yifan1207 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome Abhi! If you have time can you also update our Docs in judeval with these new changes?

@abhishekg999 abhishekg999 requested a review from yifan1207 May 24, 2025 00:30
@abhishekg999 abhishekg999 merged commit e2e30e1 into main May 26, 2025
2 of 3 checks passed
@abhishekg999 abhishekg999 deleted the ahh/new-error-format branch May 26, 2025 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants