Skip to content

Commit 9571376

Browse files
committed
Add export, examples and global review
1 parent 29077df commit 9571376

File tree

1 file changed

+51
-8
lines changed

1 file changed

+51
-8
lines changed

aisheets.md

Lines changed: 51 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ authors:
1414

1515
**🧭TL;DR**
1616

17-
Hugging Face AI Sheets is a new, open-source tool for building, enriching, and transforming datasets using AI models with no code. The tool can be deployed locally or on the Hub. It lets you use thousands of open models from the Hugging Face Hub via Inference Providers or local models, including `gpt-oss` from OpenAI!
17+
Hugging Face AI Sheets is a new, **open-source tool for building, enriching, and transforming datasets using AI models with no code**. The tool can be deployed locally or on the Hub. It lets you use thousands of open models from the Hugging Face Hub via Inference Providers or local models, including `gpt-oss` from OpenAI!
1818

1919
## Useful links
2020

@@ -36,9 +36,11 @@ You can use AI Sheets to:
3636

3737
**Compare and vibe test models.** Imagine you want to test the latest models on your data. You can import a dataset with prompts/questions, and create several columns (one per model) with a prompt like this: `Answer the following: {{prompt}}`, where `prompt` is a column in your dataset. You can validate the results manually or create a new column with an LLM as a judge prompt like this: `Evaluate the responses to the following question: {{prompt}}. Response 1: {{model1}}. Response 2: {{model2}}`, where `model1` and `model2` are columns in your dataset with different model responses.
3838

39-
**Transform a dataset.** Imagine you want to clean up a column of your dataset. You can add a new column with a prompt like this: `Remove extra punctuation marks from the following text: {{text}}`, where `text` is a column in your dataset containing the texts you want to clean up.
39+
**Improve prompts for your data and specific models.** Imagine you want to build an application to process customer requests and give automatic answers. You can load a sample dataset with customer requests and start playing and iterating with different prompts and models to generate responses. One cool feature of AI Sheets is that you can provide feedback by editing or validating cells. These example cells will be added to your prompts automatically. You can think of it as a tool to fine-tune prompts and add a few-shot examples to your prompts very efficiently, by looking at your data in real-time!
4040

41-
**Classify a dataset.** Imagine you want to classify some content in your dataset. You can add a new column with a prompt like this: `Categorize the following text: {{text}}`, where `text` is a column in your dataset containing the texts you want to categorize.
41+
**Transform a dataset.** Imagine you want to clean up a column of your dataset. You can add a new column with a prompt like `Remove extra punctuation marks from the following text: {{text}}`, where `text` is a column in your dataset containing the texts you want to clean up.
42+
43+
**Classify a dataset.** Imagine you want to classify some content in your dataset. You can add a new column with a prompt like `Categorize the following text: {{text}}`, where `text` is a column in your dataset containing the texts you want to categorize.
4244

4345
**Analyze a dataset.** Imagine you want to extract the main ideas in your dataset. You can add a new column with a prompt like this: `Extract the most important ideas from the following: {{text}}`, where `text` is a column in your dataset containing the texts you want to analyze.
4446

@@ -98,6 +100,25 @@ Think of this as an auto-dataset or prompt-to-dataset feature—you describe wha
98100
2. AI Sheets generates the schema and creates 5 sample rows
99101
3. Extend to up to 1,000 rows or modify the prompt to change structure
100102

103+
**Example**
104+
105+
If you type this prompt: `cities of the world, alongside countries they belong to and a landmark image for each, generated in Ghibli style`:
106+
107+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/A8n7AE9DnhaVvaQubxYat.png)
108+
109+
AI Sheets will automatically generate a dataset with three columns, as shown below:
110+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/9SeZR4rBHuIDYLzosUDcv.png)
111+
112+
113+
114+
This dataset contains only five rows, but you can add more cells by dragging down on each column:
115+
116+
117+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/A5xWDSJMrcVMX2dRRQb1Q.png)
118+
119+
120+
The following sections will show you how to iterate and expand the dataset.
121+
101122

102123
### Working with your dataset
103124

@@ -118,7 +139,7 @@ Once your data is loaded (regardless of how you started), you'll see it in an ed
118139
* Translate content
119140
* Or write custom prompts with "Do something with {{column}}"
120141

121-
## Refining and expanding the dataset
142+
### Refining and expanding the dataset
122143

123144
Now that you have AI columns, you can improve their results and expand your data. You can improve results by providing feedback through manual edits and likes or by adjusting the column configuration. Both require regeneration to take effect.
124145

@@ -135,7 +156,9 @@ Now that you have AI columns, you can improve their results and expand your data
135156

136157
* **Edit cells:** Click any cell to edit content directly \- this gives the model examples of your preferred output
137158
* **Like results:** Use thumbs-up to mark examples of good output
138-
* Regenerate to apply feedback to other cells in the column
159+
* Regenerate to apply feedback to other cells in the column.
160+
161+
Under the hood, these manually edited and liked cells will be used as few-shot examples for generating the cells when you regenerate or add more cells in the column!
139162

140163
**3\. Adjust column configuration** Change the prompt, switch models or providers, or modify settings, then regenerate to get better results.
141164

@@ -145,7 +168,7 @@ Now that you have AI columns, you can improve their results and expand your data
145168
* Edit anytime to change or improve output
146169
* Column regenerates with new results
147170

148-
**Switch models / providers**
171+
**Switch models/providers**
149172

150173
* Try different models for different performance or compare them.
151174
* Some are more accurate, creative, or structured than others for specific tasks.
@@ -156,6 +179,27 @@ Now that you have AI columns, you can improve their results and expand your data
156179
* Enable: Model pulls up-to-date information from the web
157180
* Disable: Offline, model-only generation
158181

182+
### Exporting your final dataset to the Hub
183+
Once you're happy with your new dataset, export it to the Hub! This has the additional benefit of generating a config file you can reuse for (1) generating more data with HF jobs [using this script](https://huggingface.co/datasets/aisheets/uv-scripts/blob/main/extend_dataset/script.py), and (2) reusing the prompts for downstream applications, including the few shots from your edited and liked cells.
184+
185+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/9To_YsUYVyJSqfL0SAJDW.png)
186+
187+
188+
Here's an [example](https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions) dataset created with AISheets, which [produces this config](https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/main/config.yml ).
189+
190+
191+
### Running data generation scripts using HF Jobs
192+
If you want to generate a larger dataset, you can use the above-mentioned config and script, like this:
193+
194+
```bash
195+
hf jobs uv run \
196+
-s HF_TOKEN=$HF_TOKEN \
197+
https://huggingface.co/datasets/aisheets/uv-scripts/raw/main/extend_dataset/script.py \ # script for running the pipeline
198+
--config https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/main/config.yml \ # config with prompts
199+
--num-rows 100 \ # limit to 100 rows, leave empty for the full dataset
200+
nvidia/Nemotron-Personas dvilasuero/nemotron-kimi-qa-distilled
201+
```
202+
159203
## Examples
160204

161205
This section provides examples of datasets you can build with AI Sheets to inspire your next project.
@@ -346,5 +390,4 @@ columns:
346390
## Next steps
347391
You can try AI Sheets [without installing anything](https://huggingface.co/spaces/aisheets/sheets) or download and deploy it locally from the [GitHub repo](https://github.com/huggingface/aisheets). For running locally and get the most out of it, we recommend you to [subscribe to PRO](https://hf.co/pro) and get 20x monthly inference usage.
348392
349-
If you have questions or suggestions, let us know in the [Community tab](https://huggingface.co/spaces/aisheets/sheets/discussions) or by opening an issue on [GitHub](https://github.com/huggingface/aisheets).
350-
393+
If you have questions or suggestions, let us know in the [Community tab](https://huggingface.co/spaces/aisheets/sheets/discussions) or by opening an issue on [GitHub](https://github.com/huggingface/aisheets).

0 commit comments

Comments
 (0)