Skip to content

Commit 8765919

Browse files
authored
Merge pull request #35 from JudgmentLabs/add_dev_docs
Add developer docs
2 parents b6944b3 + e3a4aea commit 8765919

28 files changed

+1411
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ wheels/
1717
.installed.cfg
1818
*.egg-info/
1919

20+
# APIs
21+
google-cloud-sdk/
2022
# PyInstaller
2123
# Usually these files are written by a python script from a template
2224
# before PyInstaller builds the exe, so as to inject date/other infos into it.

docs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Subproject commit 00ab9e46926eff3a327b4646a0bd71a1f2ed0650
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Datasets
2+
3+
## Quick Summary
4+
In most scenarios, you will have multiple `Example`s that you want to evaluate together.
5+
In `judgeval`, an evaluation dataset (`EvalDataset`) is a collection of `Example`s and/or `GroundTruthExample`s that you can scale evaluations across.
6+
7+
`Tip`:
8+
9+
A `GroundTruthExample` is a specific type of `Example` that do not require the `actual_output` field. This is useful for creating datasets that can be dynamically updated at evaluation time by running your workflow on the `GroundTruthExample`s to create `Example`s.
10+
11+
## Creating a Dataset
12+
13+
Creating an `EvalDataset` is as simple as supplying a list of `Example`s and/or `GroundTruthExample`s.
14+
15+
```
16+
from judgeval import (
17+
EvalDataset,
18+
Example,
19+
GroundTruthExample
20+
)
21+
22+
examples = [Example(input="...", actual_output="..."), Example(input="...", actual_output="..."), ...]
23+
ground_truth_examples = [GroundTruthExample(input="..."), GroundTruthExample(input="..."), ...]
24+
25+
dataset = EvalDataset(examples=examples, ground_truth_examples=ground_truth_examples)
26+
```
27+
28+
You can also add `Example`s and `GroundTruthExample`s to an existing `EvalDataset` using the `add_example` and `add_ground_truth_example` methods.
29+
30+
```
31+
...
32+
33+
dataset.add_example(Example(...))
34+
dataset.add_ground_truth(GroundTruthExample(...))
35+
```
36+
37+
## Saving/Loading Datasets
38+
39+
`judgeval` supports saving and loading datasets in the following formats:
40+
- JSON
41+
- CSV
42+
43+
### From Judgment
44+
You easily can save/load an `EvalDataset` from Judgment's cloud.
45+
46+
```
47+
# Saving
48+
...
49+
from judgeval import JudgmentClient
50+
51+
client = JudgmentClient()
52+
client.push_dataset(alias="my_dataset", dataset=dataset)
53+
```
54+
55+
```
56+
# Loading
57+
from judgeval import JudgmentClient
58+
59+
client = JudgmentClient()
60+
dataset = client.pull_dataset(alias="my_dataset")
61+
```
62+
63+
### From JSON
64+
65+
You can save/load an `EvalDataset` with a JSON file. Your JSON file should have the following structure:
66+
```
67+
{
68+
"examples": [{"input": "...", "actual_output": "..."}, ...],
69+
"ground_truths": [{"input": "..."}, ...]
70+
}
71+
```
72+
73+
Here's an example of how use `judgeval` to save/load from JSON.
74+
75+
```
76+
from judgeval import EvalDataset
77+
78+
# saving
79+
dataset = EvalDataset(...) # filled with examples
80+
dataset.save_as("json", "/path/to/save/dir", "save_name")
81+
82+
# loading
83+
new_dataset = EvalDataset()
84+
new_dataset.add_from_json("/path/to/your/json/file.json")
85+
86+
```
87+
88+
### From CSV
89+
90+
You can save/load an `EvalDataset` with a `.csv` file. Your CSV should contain rows that can be mapped to `Example`s via column names.
91+
TODO: this section needs to be updated because the CSV format is not yet finalized.
92+
93+
94+
Here's an example of how use `judgeval` to save/load from CSV.
95+
96+
```
97+
from judgeval import EvalDataset
98+
99+
# saving
100+
dataset = EvalDataset(...) # filled with examples
101+
dataset.save_as("csv", "/path/to/save/dir", "save_name")
102+
103+
# loading
104+
new_dataset = EvalDataset()
105+
new_dataset.add_from_csv("/path/to/your/csv/file.csv")
106+
```
107+
108+
## Evaluate On Your Dataset
109+
110+
You can use the `JudgmentClient` to evaluate the `Example`s and `GroundTruthExample`s in your dataset using scorers.
111+
112+
```
113+
...
114+
115+
dataset = client.pull_dataset(alias="my_dataset")
116+
res = client.evaluate_dataset(
117+
dataset=dataset,
118+
scorers=[JudgmentScorer(threshold=0.5, score_type=APIScorer.FAITHFULNESS)],
119+
model="gpt-4o",
120+
)
121+
```
122+
123+
## Conclusion
124+
125+
Congratulations! You've now learned how to create, save, and evaluate an `EvalDataset` in `judgeval`.
126+
127+
You can also view and manage your datasets via Judgment's platform. Check out TODO: add link here

0 commit comments

Comments
 (0)