JudgmentLabs
diff --git a/‎.gitignore
Lines changed: 2 additions & 0 deletions b/‎.gitignore
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs
Lines changed: 1 addition & 0 deletions b/‎docs
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/create_dataset.ipynb renamed to ‎judgeval_docs/create_dataset.ipynb b/‎docs/create_dataset.ipynb renamed to ‎judgeval_docs/create_dataset.ipynb
diff --git a/‎docs/create_scorer.ipynb renamed to ‎judgeval_docs/create_scorer.ipynb b/‎docs/create_scorer.ipynb renamed to ‎judgeval_docs/create_scorer.ipynb
diff --git a/‎docs/demo.ipynb renamed to ‎judgeval_docs/demo.ipynb b/‎docs/demo.ipynb renamed to ‎judgeval_docs/demo.ipynb
diff --git a/‎docs/demo_files/debug.txt renamed to ‎judgeval_docs/demo_files/debug.txt b/‎docs/demo_files/debug.txt renamed to ‎judgeval_docs/demo_files/debug.txt
diff --git a/‎docs/demo_files/tanaka.txt renamed to ‎judgeval_docs/demo_files/tanaka.txt b/‎docs/demo_files/tanaka.txt renamed to ‎judgeval_docs/demo_files/tanaka.txt
diff --git a/‎docs/demo_files/tanaka_beneficiary.txt renamed to ‎judgeval_docs/demo_files/tanaka_beneficiary.txt b/‎docs/demo_files/tanaka_beneficiary.txt renamed to ‎judgeval_docs/demo_files/tanaka_beneficiary.txt
diff --git a/‎docs/demo_files/tanaka_recommender.txt renamed to ‎judgeval_docs/demo_files/tanaka_recommender.txt b/‎docs/demo_files/tanaka_recommender.txt renamed to ‎judgeval_docs/demo_files/tanaka_recommender.txt
diff --git a/‎judgeval_docs/dev_docs/evaluation/data_datasets.mdx
Lines changed: 127 additions & 0 deletions b/‎judgeval_docs/dev_docs/evaluation/data_datasets.mdx
Lines changed: 127 additions & 0 deletions
@@ -17,6 +17,8 @@ wheels/
 .installed.cfg
 *.egg-info/
 
+# APIs
+google-cloud-sdk/
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 
@@ -0,0 +1 @@
+Subproject commit 00ab9e46926eff3a327b4646a0bd71a1f2ed0650
@@ -0,0 +1,127 @@
+# Datasets 
+
+## Quick Summary
+In most scenarios, you will have multiple `Example`s that you want to evaluate together.  
+In `judgeval`, an evaluation dataset (`EvalDataset`) is a collection of `Example`s and/or `GroundTruthExample`s that you can scale evaluations across.
+
+`Tip`:
+
+A `GroundTruthExample` is a specific type of `Example` that do not require the `actual_output` field. This is useful for creating datasets that can be dynamically updated at evaluation time by running your workflow on the `GroundTruthExample`s to create `Example`s.
+
+## Creating a Dataset
+
+Creating an `EvalDataset` is as simple as supplying a list of `Example`s and/or `GroundTruthExample`s.
+
+```
+from judgeval import (
+    EvalDataset, 
+    Example, 
+    GroundTruthExample
+)
+
+examples = [Example(input="...", actual_output="..."), Example(input="...", actual_output="..."), ...]
+ground_truth_examples = [GroundTruthExample(input="..."), GroundTruthExample(input="..."), ...]
+
+dataset = EvalDataset(examples=examples, ground_truth_examples=ground_truth_examples)
+```
+
+You can also add `Example`s and `GroundTruthExample`s to an existing `EvalDataset` using the `add_example` and `add_ground_truth_example` methods.
+
+```
+...
+
+dataset.add_example(Example(...))
+dataset.add_ground_truth(GroundTruthExample(...))
+```
+
+## Saving/Loading Datasets
+
+`judgeval` supports saving and loading datasets in the following formats:
+- JSON
+- CSV
+
+### From Judgment
+You easily can save/load an `EvalDataset` from Judgment's cloud. 
+
+```
+# Saving
+...
+from judgeval import JudgmentClient
+
+client = JudgmentClient()
+client.push_dataset(alias="my_dataset", dataset=dataset)
+```
+
+```
+# Loading
+from judgeval import JudgmentClient
+
+client = JudgmentClient()
+dataset = client.pull_dataset(alias="my_dataset")
+```
+
+### From JSON
+
+You can save/load an `EvalDataset` with a JSON file. Your JSON file should have the following structure:
+```
+{
+    "examples": [{"input": "...", "actual_output": "..."}, ...],
+    "ground_truths": [{"input": "..."}, ...]
+}
+```
+
+Here's an example of how use `judgeval` to save/load from JSON.
+
+```
+from judgeval import EvalDataset
+
+# saving
+dataset = EvalDataset(...)  # filled with examples
+dataset.save_as("json", "/path/to/save/dir", "save_name")
+
+# loading
+new_dataset = EvalDataset()
+new_dataset.add_from_json("/path/to/your/json/file.json")
+
+```
+
+### From CSV
+
+You can save/load an `EvalDataset` with a `.csv` file. Your CSV should contain rows that can be mapped to `Example`s via column names.
+TODO: this section needs to be updated because the CSV format is not yet finalized.
+
+
+Here's an example of how use `judgeval` to save/load from CSV.
+
+```
+from judgeval import EvalDataset
+
+# saving
+dataset = EvalDataset(...)  # filled with examples
+dataset.save_as("csv", "/path/to/save/dir", "save_name")
+
+# loading
+new_dataset = EvalDataset()
+new_dataset.add_from_csv("/path/to/your/csv/file.csv")
+```
+
+## Evaluate On Your Dataset
+
+You can use the `JudgmentClient` to evaluate the `Example`s and `GroundTruthExample`s in your dataset using scorers.
+
+```
+...
+
+dataset = client.pull_dataset(alias="my_dataset")
+res = client.evaluate_dataset(
+    dataset=dataset,
+    scorers=[JudgmentScorer(threshold=0.5, score_type=APIScorer.FAITHFULNESS)],
+    model="gpt-4o",
+)
+```
+
+## Conclusion 
+
+Congratulations! You've now learned how to create, save, and evaluate an `EvalDataset` in `judgeval`.
+
+You can also view and manage your datasets via Judgment's platform. Check out TODO: add link here
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Subproject commit 00ab9e46926eff3a327b4646a0bd71a1f2ed0650`