You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: aisheets.md
+51-8Lines changed: 51 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ authors:
14
14
15
15
**🧭TL;DR**
16
16
17
-
Hugging Face AI Sheets is a new, open-source tool for building, enriching, and transforming datasets using AI models with no code. The tool can be deployed locally or on the Hub. It lets you use thousands of open models from the Hugging Face Hub via Inference Providers or local models, including `gpt-oss` from OpenAI!
17
+
Hugging Face AI Sheets is a new, **open-source tool for building, enriching, and transforming datasets using AI models with no code**. The tool can be deployed locally or on the Hub. It lets you use thousands of open models from the Hugging Face Hub via Inference Providers or local models, including `gpt-oss` from OpenAI!
18
18
19
19
## Useful links
20
20
@@ -36,9 +36,11 @@ You can use AI Sheets to:
36
36
37
37
**Compare and vibe test models.** Imagine you want to test the latest models on your data. You can import a dataset with prompts/questions, and create several columns (one per model) with a prompt like this: `Answer the following: {{prompt}}`, where `prompt` is a column in your dataset. You can validate the results manually or create a new column with an LLM as a judge prompt like this: `Evaluate the responses to the following question: {{prompt}}. Response 1: {{model1}}. Response 2: {{model2}}`, where `model1` and `model2` are columns in your dataset with different model responses.
38
38
39
-
**Transform a dataset.** Imagine you want to clean up a column of your dataset. You can add a new column with a prompt like this: `Remove extra punctuation marks from the following text: {{text}}`, where `text`is a column in your dataset containing the texts you want to clean up.
39
+
**Improve prompts for your data and specific models.** Imagine you want to build an application to process customer requests and give automatic answers. You can load a sample dataset with customer requests and start playing and iterating with different prompts and models to generate responses. One cool feature of AI Sheets is that you can provide feedback by editing or validating cells. These example cells will be added to your prompts automatically. You can think of it as a tool to fine-tune prompts and add a few-shot examples to your prompts very efficiently, by looking at your data in real-time!
40
40
41
-
**Classify a dataset.** Imagine you want to classify some content in your dataset. You can add a new column with a prompt like this: `Categorize the following text: {{text}}`, where `text` is a column in your dataset containing the texts you want to categorize.
41
+
**Transform a dataset.** Imagine you want to clean up a column of your dataset. You can add a new column with a prompt like `Remove extra punctuation marks from the following text: {{text}}`, where `text` is a column in your dataset containing the texts you want to clean up.
42
+
43
+
**Classify a dataset.** Imagine you want to classify some content in your dataset. You can add a new column with a prompt like `Categorize the following text: {{text}}`, where `text` is a column in your dataset containing the texts you want to categorize.
42
44
43
45
**Analyze a dataset.** Imagine you want to extract the main ideas in your dataset. You can add a new column with a prompt like this: `Extract the most important ideas from the following: {{text}}`, where `text` is a column in your dataset containing the texts you want to analyze.
44
46
@@ -98,6 +100,25 @@ Think of this as an auto-dataset or prompt-to-dataset feature—you describe wha
98
100
2. AI Sheets generates the schema and creates 5 sample rows
99
101
3. Extend to up to 1,000 rows or modify the prompt to change structure
100
102
103
+
**Example**
104
+
105
+
If you type this prompt: `cities of the world, alongside countries they belong to and a landmark image for each, generated in Ghibli style`:
The following sections will show you how to iterate and expand the dataset.
121
+
101
122
102
123
### Working with your dataset
103
124
@@ -118,7 +139,7 @@ Once your data is loaded (regardless of how you started), you'll see it in an ed
118
139
* Translate content
119
140
* Or write custom prompts with "Do something with {{column}}"
120
141
121
-
## Refining and expanding the dataset
142
+
###Refining and expanding the dataset
122
143
123
144
Now that you have AI columns, you can improve their results and expand your data. You can improve results by providing feedback through manual edits and likes or by adjusting the column configuration. Both require regeneration to take effect.
124
145
@@ -135,7 +156,9 @@ Now that you have AI columns, you can improve their results and expand your data
135
156
136
157
***Edit cells:** Click any cell to edit content directly \- this gives the model examples of your preferred output
137
158
***Like results:** Use thumbs-up to mark examples of good output
138
-
* Regenerate to apply feedback to other cells in the column
159
+
* Regenerate to apply feedback to other cells in the column.
160
+
161
+
Under the hood, these manually edited and liked cells will be used as few-shot examples for generating the cells when you regenerate or add more cells in the column!
139
162
140
163
**3\. Adjust column configuration** Change the prompt, switch models or providers, or modify settings, then regenerate to get better results.
141
164
@@ -145,7 +168,7 @@ Now that you have AI columns, you can improve their results and expand your data
145
168
* Edit anytime to change or improve output
146
169
* Column regenerates with new results
147
170
148
-
**Switch models / providers**
171
+
**Switch models/providers**
149
172
150
173
* Try different models for different performance or compare them.
151
174
* Some are more accurate, creative, or structured than others for specific tasks.
@@ -156,6 +179,27 @@ Now that you have AI columns, you can improve their results and expand your data
156
179
* Enable: Model pulls up-to-date information from the web
157
180
* Disable: Offline, model-only generation
158
181
182
+
### Exporting your final dataset to the Hub
183
+
Once you're happy with your new dataset, export it to the Hub! This has the additional benefit of generating a config file you can reuse for (1) generating more data with HF jobs [using this script](https://huggingface.co/datasets/aisheets/uv-scripts/blob/main/extend_dataset/script.py), and (2) reusing the prompts for downstream applications, including the few shots from your edited and liked cells.
Here's an [example](https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions) dataset created with AISheets, which [produces this config](https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/main/config.yml).
189
+
190
+
191
+
### Running data generation scripts using HF Jobs
192
+
If you want to generate a larger dataset, you can use the above-mentioned config and script, like this:
193
+
194
+
```bash
195
+
hf jobs uv run \
196
+
-s HF_TOKEN=$HF_TOKEN \
197
+
https://huggingface.co/datasets/aisheets/uv-scripts/raw/main/extend_dataset/script.py \ # script for running the pipeline
198
+
--config https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/main/config.yml \ # config with prompts
199
+
--num-rows 100 \ # limit to 100 rows, leave empty for the full dataset
This section provides examples of datasets you can build with AI Sheets to inspire your next project.
@@ -346,5 +390,4 @@ columns:
346
390
## Next steps
347
391
You can try AI Sheets [without installing anything](https://huggingface.co/spaces/aisheets/sheets) or download and deploy it locally from the [GitHub repo](https://github.com/huggingface/aisheets). For running locally and get the most out of it, we recommend you to [subscribe to PRO](https://hf.co/pro) and get 20x monthly inference usage.
348
392
349
-
If you have questions or suggestions, let us know in the [Community tab](https://huggingface.co/spaces/aisheets/sheets/discussions) or by opening an issue on [GitHub](https://github.com/huggingface/aisheets).
350
-
393
+
If you have questions or suggestions, let us know in the [Community tab](https://huggingface.co/spaces/aisheets/sheets/discussions) or by opening an issue on [GitHub](https://github.com/huggingface/aisheets).
0 commit comments