Skip to content

Add trace ID to datasets, update UTs accordingly #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 2, 2025

Conversation

SecroLoL
Copy link
Contributor

When adding the trace_id field to Examples, I forgot to add them to EvalDatasets as well. This PR takes care of that issue and also updates the UTs for the EvalDataset class accordingly.

Copy link
Collaborator

@JCamyre JCamyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


The JSON file is expected to have the following format:
{
"ground_truths": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does ground truths mean in relation to the examples, I thought the examples has the ground truth (with the actual_output field)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -195,17 +252,21 @@ def add_from_csv(
"Please install pandas to use this method. 'pip install pandas'"
)

df = pd.read_csv(file_path)
df = pd.read_csv(file_path, dtype={'trace_id': str})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else can dtype be?
A trace_id or trace_file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas naturally reads numbers in data files as ints, not strings (can lead to unexpected behavior)

"ground_truths": [
{
"input": "test input",
"actual_output": null,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the difference between ground_truths and examples that ground_truths don't have an actual_output (this is for the LLM system to produce)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Collaborator

@JCamyre JCamyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again, grape UT changes!

Copy link
Collaborator

@JCamyre JCamyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great UT fixes

@JCamyre JCamyre merged commit ca23e6d into main Jan 2, 2025
2 checks passed
@SecroLoL SecroLoL deleted the add_trace_to_datasets branch February 10, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants