Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Commit 56cb37f

Browse files
authored
Merge pull request #290 from rsepassi/push
v1.2.2
2 parents 8f83adf + b8e59e7 commit 56cb37f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1822
-730
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@ on the task (e.g. fed through a final linear transform to produce logits for a
214214
softmax over classes). All models are imported in
215215
[`models.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/models/models.py),
216216
inherit from `T2TModel` - defined in
217-
[`t2t_model.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py) - and are registered with
217+
[`t2t_model.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py) -
218+
and are registered with
218219
[`@registry.register_model`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py).
219220

220221
### Hyperparameter Sets

docs/new_problem.md

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,17 @@ Let's add a new dataset together and train the transformer model. We'll be learn
1515

1616
For each problem we want to tackle we create a new problem class and register it. Let's call our problem `Word2def`.
1717

18-
Since many text2text problems share similar methods, there's already a class called [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L354) that extends the base problem class, `Problem` (both found in [`problem.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py)).
19-
20-
For our problem, we can go ahead and create the file `word2def.py` in the [`data_generators`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/) folder and add our new problem, `Word2def`, which extends [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/24071ba07d5a14c170044c5e60a24bda8179fb7a/tensor2tensor/data_generators/problem.py#L354). Let's also register it while we're at it so we can specify the problem through flags.
18+
Since many text2text problems share similar methods, there's already a class
19+
called `Text2TextProblem` that extends the base problem class, `Problem`
20+
(both found in
21+
[`problem.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py)).
22+
23+
For our problem, we can go ahead and create the file `word2def.py` in the
24+
[`data_generators`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/)
25+
folder and add our new problem, `Word2def`, which extends
26+
[`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py).
27+
Let's also register it while we're at it so we can specify the problem through
28+
flags.
2129

2230
```python
2331
@registry.register_problem
@@ -28,7 +36,9 @@ class Word2def(problem.Text2TextProblem):
2836
...
2937
```
3038

31-
We need to implement the following methods from [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L354) in our new class:
39+
We need to implement the following methods from
40+
[`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py).
41+
in our new class:
3242
* is_character_level
3343
* targeted_vocab_size
3444
* generator
@@ -42,7 +52,12 @@ Let's tackle them one by one:
4252

4353
**input_space_id, target_space_id, is_character_level, targeted_vocab_size, use_subword_tokenizer**:
4454

45-
SpaceIDs tell Tensor2Tensor what sort of space the input and target tensors are in. These are things like, EN_CHR (English character), EN_TOK (English token), AUDIO_WAV (audio waveform), IMAGE, DNA (genetic bases). The complete list can be found at [`data_generators/problem.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py) in the class `SpaceID`.
55+
SpaceIDs tell Tensor2Tensor what sort of space the input and target tensors are
56+
in. These are things like, EN_CHR (English character), EN_TOK (English token),
57+
AUDIO_WAV (audio waveform), IMAGE, DNA (genetic bases). The complete list can be
58+
found at
59+
[`data_generators/problem.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py).
60+
in the class `SpaceID`.
4661

4762
Since we're generating definitions and feeding in words at the character level, we set `is_character_level` to true, and use the same SpaceID, EN_CHR, for both input and target. Additionally, since we aren't using tokens, we don't need to give a `targeted_vocab_size` or define `use_subword_tokenizer`.
4863

@@ -58,7 +73,7 @@ The number of shards to break data files into.
5873
@registry.register_problem()
5974
class Word2def(problem.Text2TextProblem):
6075
"""Problem spec for English word to dictionary definition."""
61-
76+
6277
@property
6378
def is_character_level(self):
6479
return True
@@ -86,7 +101,15 @@ class Word2def(problem.Text2TextProblem):
86101

87102
**generator**:
88103

89-
We're almost done. `generator` generates the training and evaluation data and stores them in files like "word2def_train.lang1" in your DATA_DIR. Thankfully several commonly used methods like `character_generator`, and `token_generator` are already written in the file [`wmt.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/wmt.py). We will import `character_generator` and [`text_encoder`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/text_encoder.py) to write:
104+
We're almost done. `generator` generates the training and evaluation data and
105+
stores them in files like "word2def_train.lang1" in your DATA_DIR. Thankfully
106+
several commonly used methods like `character_generator`, and `token_generator`
107+
are already written in the file
108+
[`wmt.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/wmt.py).
109+
We will import `character_generator` and
110+
[`text_encoder`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/text_encoder.py)
111+
to write:
112+
90113
```python
91114
def generator(self, data_dir, tmp_dir, train):
92115
character_vocab = text_encoder.ByteTextEncoder()
@@ -152,6 +175,7 @@ _WORD2DEF_TEST_DATASETS = [
152175
## Putting it all together
153176

154177
Now our `word2def.py` file looks like:
178+
155179
```python
156180
""" Problem definition for word to dictionary definition.
157181
"""
@@ -210,7 +234,7 @@ class Word2def(problem.Text2TextProblem):
210234
```
211235

212236
# Hyperparameters
213-
All hyperparamters inherit from `_default_hparams()` in `problem.py.` If you would like to customize your hyperparameters, register a new hyperparameter set in `word2def.py` like the example provided in the walkthrough. For example:
237+
All hyperparamters inherit from `_default_hparams()` in `problem.py.` If you would like to customize your hyperparameters, register a new hyperparameter set in `word2def.py` like the example provided in the walkthrough. For example:
214238

215239
```python
216240
from tensor2tensor.models import transformer

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setup(
77
name='tensor2tensor',
8-
version='1.2.1',
8+
version='1.2.2',
99
description='Tensor2Tensor',
1010
author='Google Inc.',
1111
author_email='no-reply@google.com',

tensor2tensor/data_generators/all_problems.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,4 +45,3 @@
4545
pass
4646
# pylint: enable=g-import-not-at-top
4747
# pylint: enable=unused-import
48-

tensor2tensor/data_generators/cnn_dailymail.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,8 @@ def _maybe_download_corpora(tmp_dir):
5353
filepath of the downloaded corpus file.
5454
"""
5555
cnn_filename = "cnn_stories.tgz"
56-
dailymail_filename = "dailymail_stories.tgz"
5756
cnn_finalpath = os.path.join(tmp_dir, "cnn/stories/")
57+
dailymail_filename = "dailymail_stories.tgz"
5858
dailymail_finalpath = os.path.join(tmp_dir, "dailymail/stories/")
5959
if not tf.gfile.Exists(cnn_finalpath):
6060
cnn_file = generator_utils.maybe_download_from_drive(
@@ -63,7 +63,7 @@ def _maybe_download_corpora(tmp_dir):
6363
cnn_tar.extractall(tmp_dir)
6464
if not tf.gfile.Exists(dailymail_finalpath):
6565
dailymail_file = generator_utils.maybe_download_from_drive(
66-
tmp_dir, dailymail_filename, _CNN_STORIES_DRIVE_URL)
66+
tmp_dir, dailymail_filename, _DAILYMAIL_STORIES_DRIVE_URL)
6767
with tarfile.open(dailymail_file, "r:gz") as dailymail_tar:
6868
dailymail_tar.extractall(tmp_dir)
6969
return [cnn_finalpath, dailymail_finalpath]

tensor2tensor/data_generators/gene_expression.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ def generate_data(self, data_dir, tmp_dir, task_id=-1):
142142
# Shuffle
143143
generator_utils.shuffle_dataset(all_filepaths)
144144

145-
def hparams(self, defaults, model_hparams):
145+
def hparams(self, defaults, unused_model_hparams):
146146
p = defaults
147147
vocab_size = self._encoders["inputs"].vocab_size
148148
p.input_modality = {"inputs": (registry.Modalities.SYMBOL, vocab_size)}
@@ -159,9 +159,8 @@ def example_reading_spec(self):
159159
data_items_to_decoders = None
160160
return (data_fields, data_items_to_decoders)
161161

162-
def preprocess_examples(self, examples, mode, hparams):
162+
def preprocess_examples(self, examples, mode, unused_hparams):
163163
del mode
164-
del hparams
165164

166165
# Reshape targets to contain num_output_predictions per output timestep
167166
examples["targets"] = tf.reshape(examples["targets"],

tensor2tensor/data_generators/ice_parsing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ def generate_data(self, data_dir, tmp_dir, task_id=-1):
109109
self.targeted_vocab_size),
110110
self.dev_filepaths(data_dir, 1, shuffled=False))
111111

112-
def hparams(self, defaults, model_hparams):
112+
def hparams(self, defaults, unused_model_hparams):
113113
p = defaults
114114
source_vocab_size = self._encoders["inputs"].vocab_size
115115
p.input_modality = {"inputs": (registry.Modalities.SYMBOL,

tensor2tensor/data_generators/image.py

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ def resize(img, size):
105105
examples["targets"] = resize(inputs, 32)
106106
return examples
107107

108-
def hparams(self, defaults, model_hparams):
108+
def hparams(self, defaults, unused_model_hparams):
109109
p = defaults
110110
p.input_modality = {"inputs": ("image:identity_no_pad", None)}
111111
p.target_modality = ("image:identity_no_pad", None)
@@ -229,7 +229,7 @@ def feature_encoders(self, data_dir):
229229
"targets": text_encoder.SubwordTextEncoder(vocab_filename)
230230
}
231231

232-
def hparams(self, defaults, model_hparams):
232+
def hparams(self, defaults, unused_model_hparams):
233233
p = defaults
234234
p.input_modality = {"inputs": (registry.Modalities.IMAGE, None)}
235235
vocab_size = self._encoders["targets"].vocab_size
@@ -264,10 +264,21 @@ def train_shards(self):
264264
def dev_shards(self):
265265
return 1
266266

267+
@property
268+
def class_labels(self):
269+
return ["ID_%d" % i for i in range(self.num_classes)]
270+
271+
def feature_encoders(self, data_dir):
272+
del data_dir
273+
return {
274+
"inputs": text_encoder.TextEncoder(),
275+
"targets": text_encoder.ClassLabelEncoder(self.class_labels)
276+
}
277+
267278
def generator(self, data_dir, tmp_dir, is_training):
268279
raise NotImplementedError()
269280

270-
def hparams(self, defaults, model_hparams):
281+
def hparams(self, defaults, unused_model_hparams):
271282
p = defaults
272283
small_modality = "%s:small_image_modality" % registry.Modalities.IMAGE
273284
modality = small_modality if self.is_small else registry.Modalities.IMAGE
@@ -302,7 +313,7 @@ def resize(img):
302313
return tf.to_int64(tf.image.resize_images(img, [299, 299]))
303314

304315
inputs = tf.cast(examples["inputs"], tf.int64)
305-
if mode == tf.contrib.learn.ModeKeys.TRAIN:
316+
if mode == tf.estimator.ModeKeys.TRAIN:
306317
examples["inputs"] = tf.cond( # Preprocess 90% of the time.
307318
tf.less(tf.random_uniform([]), 0.9),
308319
lambda img=inputs: preprocess(img),
@@ -349,7 +360,7 @@ def is_small(self):
349360
def num_classes(self):
350361
return 1000
351362

352-
def preprocess_examples(self, examples, mode, hparams):
363+
def preprocess_examples(self, examples, mode, unused_hparams):
353364
# Just resize with area.
354365
if self._was_reversed:
355366
examples["inputs"] = tf.to_int64(
@@ -491,6 +502,10 @@ def is_small(self):
491502
def num_classes(self):
492503
return 10
493504

505+
@property
506+
def class_labels(self):
507+
return [str(c) for c in range(self.num_classes)]
508+
494509
@property
495510
def train_shards(self):
496511
return 10
@@ -564,9 +579,17 @@ def cifar10_generator(tmp_dir, training, how_many, start_from=0):
564579

565580
@registry.register_problem
566581
class ImageCifar10Tune(ImageMnistTune):
582+
"""Cifar-10 Tune."""
583+
584+
@property
585+
def class_labels(self):
586+
return [
587+
"airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse",
588+
"ship", "truck"
589+
]
567590

568-
def preprocess_examples(self, examples, mode, hparams):
569-
if mode == tf.contrib.learn.ModeKeys.TRAIN:
591+
def preprocess_examples(self, examples, mode, unused_hparams):
592+
if mode == tf.estimator.ModeKeys.TRAIN:
570593
examples["inputs"] = common_layers.cifar_image_augmentation(
571594
examples["inputs"])
572595
return examples
@@ -591,7 +614,7 @@ def generator(self, data_dir, tmp_dir, is_training):
591614
@registry.register_problem
592615
class ImageCifar10Plain(ImageCifar10):
593616

594-
def preprocess_examples(self, examples, mode, hparams):
617+
def preprocess_examples(self, examples, mode, unused_hparams):
595618
return examples
596619

597620

@@ -730,7 +753,7 @@ def feature_encoders(self, data_dir):
730753
encoder = text_encoder.SubwordTextEncoder(vocab_filename)
731754
return {"targets": encoder}
732755

733-
def hparams(self, defaults, model_hparams):
756+
def hparams(self, defaults, unused_model_hparams):
734757
p = defaults
735758
p.input_modality = {"inputs": (registry.Modalities.IMAGE, None)}
736759
encoder = self._encoders["targets"]

tensor2tensor/data_generators/imdb.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ def generate_data(self, data_dir, tmp_dir, task_id=-1):
9797
self.generator(data_dir, tmp_dir, True), train_paths,
9898
self.generator(data_dir, tmp_dir, False), dev_paths)
9999

100-
def hparams(self, defaults, model_hparams):
100+
def hparams(self, defaults, unused_model_hparams):
101101
p = defaults
102102
source_vocab_size = self._encoders["inputs"].vocab_size
103103
p.input_modality = {
@@ -112,7 +112,7 @@ def feature_encoders(self, data_dir):
112112
encoder = text_encoder.SubwordTextEncoder(vocab_filename)
113113
return {
114114
"inputs": encoder,
115-
"targets": text_encoder.TextEncoder(),
115+
"targets": text_encoder.ClassLabelEncoder(["neg", "pos"]),
116116
}
117117

118118
def example_reading_spec(self):

0 commit comments

Comments
 (0)