-
Notifications
You must be signed in to change notification settings - Fork 82
Refactor default judges #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…new scorer structure
…eds to be manually set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at my comments, this is sick
model=model, | ||
metadata={}, | ||
log_results=log_results, | ||
project_name="TestSpanLevel", | ||
eval_run_name="TestSpanLevel", | ||
project_name="TestSpanLevel1", # TODO this should be dynamic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will handle this, I'll share my thoughts in Slack. In my multi-step eval PR, I added a project and trace name, which will tie in nicely to generate automatic eval run names (to improve UX while remaining clear).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice B)
Default judges used to be created with the syntax
We can now create them by directly importing
This is a stylistic change but also reveals strong future benefit as we add scorers that require specific args, such as the new
JSONCorrectness
scorer, which needs aschema
fieldscorer = JSONCorrectnessScorer(schema=...)
We wouldn't have been able to provide this with the previous schema, but now we can.
Major changes