|
| 1 | +# Datasets |
| 2 | + |
| 3 | +## Quick Summary |
| 4 | +In most scenarios, you will have multiple `Example`s that you want to evaluate together. |
| 5 | +In `judgeval`, an evaluation dataset (`EvalDataset`) is a collection of `Example`s and/or `GroundTruthExample`s that you can scale evaluations across. |
| 6 | + |
| 7 | +`Tip`: |
| 8 | + |
| 9 | +A `GroundTruthExample` is a specific type of `Example` that do not require the `actual_output` field. This is useful for creating datasets that can be dynamically updated at evaluation time by running your workflow on the `GroundTruthExample`s to create `Example`s. |
| 10 | + |
| 11 | +## Creating a Dataset |
| 12 | + |
| 13 | +Creating an `EvalDataset` is as simple as supplying a list of `Example`s and/or `GroundTruthExample`s. |
| 14 | + |
| 15 | +``` |
| 16 | +from judgeval import ( |
| 17 | + EvalDataset, |
| 18 | + Example, |
| 19 | + GroundTruthExample |
| 20 | +) |
| 21 | +
|
| 22 | +examples = [Example(input="...", actual_output="..."), Example(input="...", actual_output="..."), ...] |
| 23 | +ground_truth_examples = [GroundTruthExample(input="..."), GroundTruthExample(input="..."), ...] |
| 24 | +
|
| 25 | +dataset = EvalDataset(examples=examples, ground_truth_examples=ground_truth_examples) |
| 26 | +``` |
| 27 | + |
| 28 | +You can also add `Example`s and `GroundTruthExample`s to an existing `EvalDataset` using the `add_example` and `add_ground_truth_example` methods. |
| 29 | + |
| 30 | +``` |
| 31 | +... |
| 32 | +
|
| 33 | +dataset.add_example(Example(...)) |
| 34 | +dataset.add_ground_truth(GroundTruthExample(...)) |
| 35 | +``` |
| 36 | + |
| 37 | +## Saving/Loading Datasets |
| 38 | + |
| 39 | +`judgeval` supports saving and loading datasets in the following formats: |
| 40 | +- JSON |
| 41 | +- CSV |
| 42 | + |
| 43 | +### From Judgment |
| 44 | +You easily can save/load an `EvalDataset` from Judgment's cloud. |
| 45 | + |
| 46 | +``` |
| 47 | +# Saving |
| 48 | +... |
| 49 | +from judgeval import JudgmentClient |
| 50 | +
|
| 51 | +client = JudgmentClient() |
| 52 | +client.push_dataset(alias="my_dataset", dataset=dataset) |
| 53 | +``` |
| 54 | + |
| 55 | +``` |
| 56 | +# Loading |
| 57 | +from judgeval import JudgmentClient |
| 58 | +
|
| 59 | +client = JudgmentClient() |
| 60 | +dataset = client.pull_dataset(alias="my_dataset") |
| 61 | +``` |
| 62 | + |
| 63 | +### From JSON |
| 64 | + |
| 65 | +You can save/load an `EvalDataset` with a JSON file. Your JSON file should have the following structure: |
| 66 | +``` |
| 67 | +{ |
| 68 | + "examples": [{"input": "...", "actual_output": "..."}, ...], |
| 69 | + "ground_truths": [{"input": "..."}, ...] |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +Here's an example of how use `judgeval` to save/load from JSON. |
| 74 | + |
| 75 | +``` |
| 76 | +from judgeval import EvalDataset |
| 77 | +
|
| 78 | +# saving |
| 79 | +dataset = EvalDataset(...) # filled with examples |
| 80 | +dataset.save_as("json", "/path/to/save/dir", "save_name") |
| 81 | +
|
| 82 | +# loading |
| 83 | +new_dataset = EvalDataset() |
| 84 | +new_dataset.add_from_json("/path/to/your/json/file.json") |
| 85 | +
|
| 86 | +``` |
| 87 | + |
| 88 | +### From CSV |
| 89 | + |
| 90 | +You can save/load an `EvalDataset` with a `.csv` file. Your CSV should contain rows that can be mapped to `Example`s via column names. |
| 91 | +TODO: this section needs to be updated because the CSV format is not yet finalized. |
| 92 | + |
| 93 | + |
| 94 | +Here's an example of how use `judgeval` to save/load from CSV. |
| 95 | + |
| 96 | +``` |
| 97 | +from judgeval import EvalDataset |
| 98 | +
|
| 99 | +# saving |
| 100 | +dataset = EvalDataset(...) # filled with examples |
| 101 | +dataset.save_as("csv", "/path/to/save/dir", "save_name") |
| 102 | +
|
| 103 | +# loading |
| 104 | +new_dataset = EvalDataset() |
| 105 | +new_dataset.add_from_csv("/path/to/your/csv/file.csv") |
| 106 | +``` |
| 107 | + |
| 108 | +## Evaluate On Your Dataset |
| 109 | + |
| 110 | +You can use the `JudgmentClient` to evaluate the `Example`s and `GroundTruthExample`s in your dataset using scorers. |
| 111 | + |
| 112 | +``` |
| 113 | +... |
| 114 | +
|
| 115 | +dataset = client.pull_dataset(alias="my_dataset") |
| 116 | +res = client.evaluate_dataset( |
| 117 | + dataset=dataset, |
| 118 | + scorers=[JudgmentScorer(threshold=0.5, score_type=APIScorer.FAITHFULNESS)], |
| 119 | + model="gpt-4o", |
| 120 | +) |
| 121 | +``` |
| 122 | + |
| 123 | +## Conclusion |
| 124 | + |
| 125 | +Congratulations! You've now learned how to create, save, and evaluate an `EvalDataset` in `judgeval`. |
| 126 | + |
| 127 | +You can also view and manage your datasets via Judgment's platform. Check out TODO: add link here |
0 commit comments