Skip to content

Commit 13ebd55

Browse files
committed
add documentation about extracting part of synthetic samples from checkpoints
1 parent 92a5523 commit 13ebd55

File tree

7 files changed

+55
-0
lines changed

7 files changed

+55
-0
lines changed

doc/source/getting_started/examples.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ Images
1818
* **CelebA dataset (simulator-generated data)**: `This example <CelebA DigiFace1M example_>`__ shows how to generate differentially private synthetic images for the `CelebA dataset`_ using `the generated data from a computer graphics-based renderer for face images <DigiFace1M_>`__.
1919
* **CelebA dataset (weak simulator)**: `This example <CelebA avatar example_>`__ shows how to generate differentially private synthetic images for the `CelebA dataset`_ using `a rule-based avatar generator <python_avatars_>`__.
2020

21+
.. _text:
22+
2123
Text
2224
----
2325

@@ -39,6 +41,23 @@ These examples follow the experimental settings in the paper `Differentially Pri
3941
* **Huggingface models**: `See example <PubMed Huggingface example_>`__
4042

4143

44+
Checkpoint Operation
45+
--------------------
46+
47+
By default, the above examples will save the generated synthetic data (e.g., images, text). Besides, they also save the checkpoints with more complete information of synthetic data, and we can use :doc:`data <details/data>` and :doc:`callback <details/callback_and_logger>` APIs to further process the checkpoints. For example, in the :ref:`text` examples, the CSV files of synthetic text contain both the text selected by the histogram and the generated variations of the selected text. However, in the downstream evaluation of `Differentially Private Synthetic Data via Foundation Model APIs 2: Text (ICML 2024 Spotlight) <pe2_paper_>`__, only the text selected by the histogram is used. We can use the following code to extract the selected text from the checkpoints into a new CSV file:
48+
49+
.. code-block:: python
50+
51+
from pe.data import Data
52+
from pe.callback import SaveTextToCSV
53+
from pe.constant.data import FROM_LAST_FLAG_COLUMN_NAME
54+
55+
data = Data()
56+
data.load_checkpoint("<checkpoint path>")
57+
data = data.filter({FROM_LAST_FLAG_COLUMN_NAME: 1})
58+
SaveTextToCSV(output_folder="from_last")(data)
59+
60+
4261
.. _ImageNet diffusion model: https://github.com/openai/improved-diffusion
4362
.. _Stable Diffusion: https://huggingface.co/CompVis/stable-diffusion-v1-4
4463

example/text/openreview_huggingface/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88
following link for an example:
99
https://github.com/microsoft/DPSDA/blob/main/pe/llm/huggingface/register_fastchat/gpt2.py
1010
11+
The saved CSV files contain both the text selected by the histogram and the generated variations of the selected text,
12+
while the original paper (https://arxiv.org/abs/2403.01749) only use the text selected by the histogram for downstream
13+
evaluation. We can extract the desired text from the saved checkpoints; please see
14+
https://microsoft.github.io/DPSDA/getting_started/examples.html#checkpoint-operation
15+
for more details.
16+
1117
For detailed information about parameters and APIs, please consult the documentation of the Private Evolution library:
1218
https://microsoft.github.io/DPSDA/.
1319
"""

example/text/openreview_openai/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,12 @@
2929
```
3030
See https://github.com/theskumar/python-dotenv for more information about the .env file.
3131
32+
The saved CSV files contain both the text selected by the histogram and the generated variations of the selected text,
33+
while the original paper (https://arxiv.org/abs/2403.01749) only use the text selected by the histogram for downstream
34+
evaluation. We can extract the desired text from the saved checkpoints; please see
35+
https://microsoft.github.io/DPSDA/getting_started/examples.html#checkpoint-operation
36+
for more details.
37+
3238
For detailed information about parameters and APIs, please consult the documentation of the Private Evolution library:
3339
https://microsoft.github.io/DPSDA/.
3440
"""

example/text/pubmed_huggingface/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88
following link for an example:
99
https://github.com/microsoft/DPSDA/blob/main/pe/llm/huggingface/register_fastchat/gpt2.py
1010
11+
The saved CSV files contain both the text selected by the histogram and the generated variations of the selected text,
12+
while the original paper (https://arxiv.org/abs/2403.01749) only use the text selected by the histogram for downstream
13+
evaluation. We can extract the desired text from the saved checkpoints; please see
14+
https://microsoft.github.io/DPSDA/getting_started/examples.html#checkpoint-operation
15+
for more details.
16+
1117
For detailed information about parameters and APIs, please consult the documentation of the Private Evolution library:
1218
https://microsoft.github.io/DPSDA/.
1319
"""

example/text/pubmed_openai/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,12 @@
2929
```
3030
See https://github.com/theskumar/python-dotenv for more information about the .env file.
3131
32+
The saved CSV files contain both the text selected by the histogram and the generated variations of the selected text,
33+
while the original paper (https://arxiv.org/abs/2403.01749) only use the text selected by the histogram for downstream
34+
evaluation. We can extract the desired text from the saved checkpoints; please see
35+
https://microsoft.github.io/DPSDA/getting_started/examples.html#checkpoint-operation
36+
for more details.
37+
3238
For detailed information about parameters and APIs, please consult the documentation of the Private Evolution library:
3339
https://microsoft.github.io/DPSDA/.
3440
"""

example/text/yelp_huggingface/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88
following link for an example:
99
https://github.com/microsoft/DPSDA/blob/main/pe/llm/huggingface/register_fastchat/gpt2.py
1010
11+
The saved CSV files contain both the text selected by the histogram and the generated variations of the selected text,
12+
while the original paper (https://arxiv.org/abs/2403.01749) only use the text selected by the histogram for downstream
13+
evaluation. We can extract the desired text from the saved checkpoints; please see
14+
https://microsoft.github.io/DPSDA/getting_started/examples.html#checkpoint-operation
15+
for more details.
16+
1117
For detailed information about parameters and APIs, please consult the documentation of the Private Evolution library:
1218
https://microsoft.github.io/DPSDA/.
1319
"""

example/text/yelp_openai/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,12 @@
2929
```
3030
See https://github.com/theskumar/python-dotenv for more information about the .env file.
3131
32+
The saved CSV files contain both the text selected by the histogram and the generated variations of the selected text,
33+
while the original paper (https://arxiv.org/abs/2403.01749) only use the text selected by the histogram for downstream
34+
evaluation. We can extract the desired text from the saved checkpoints; please see
35+
https://microsoft.github.io/DPSDA/getting_started/examples.html#checkpoint-operation
36+
for more details.
37+
3238
For detailed information about parameters and APIs, please consult the documentation of the Private Evolution library:
3339
https://microsoft.github.io/DPSDA/.
3440
"""

0 commit comments

Comments
 (0)