|
4103 | 4103 | "id": "45a65cda-db08-441c-9f60-cf79138e029d"
|
4104 | 4104 | },
|
4105 | 4105 | "source": [
|
4106 |
| - "Then we'll setup device-agonistc code." |
| 4106 | + "Then we'll setup device-agnostic code." |
4107 | 4107 | ]
|
4108 | 4108 | },
|
4109 | 4109 | {
|
|
4327 | 4327 | "source": [
|
4328 | 4328 | "Finally, we'll transform our images into tensors and turn the tensors into DataLoaders.\n",
|
4329 | 4329 | "\n",
|
4330 |
| - "Since we're using a pretrained model form `torchvision.models` we can call the `transforms()` method on it to get its required transforms.\n", |
| 4330 | + "Since we're using a pretrained model from `torchvision.models` we can call the `transforms()` method on it to get its required transforms.\n", |
4331 | 4331 | "\n",
|
4332 | 4332 | "Remember, if you're going to use a pretrained model, it's generally important to **ensure your own custom data is transformed/formatted in the same way the data the original model was trained on**.\n",
|
4333 | 4333 | "\n",
|
|
4372 | 4372 | "source": [
|
4373 | 4373 | "And now we've got transforms ready, we can turn our images into DataLoaders using the `data_setup.create_dataloaders()` method we created in [05. PyTorch Going Modular section 2](https://www.learnpytorch.io/05_pytorch_going_modular/#2-create-datasets-and-dataloaders-data_setuppy).\n",
|
4374 | 4374 | "\n",
|
4375 |
| - "Since we're using a feature extractor model (less trainable parameters), we could increase the batch size to a higher value (if we set it to 1024, we'd be mimicing an improvement found in [*Better plain ViT baselines for ImageNet-1k*](https://arxiv.org/abs/2205.01580), a paper which improves upon the original ViT paper and suggested extra reading). But since we only have ~200 training samples total, we'll stick with 32." |
| 4375 | + "Since we're using a feature extractor model (less trainable parameters), we could increase the batch size to a higher value (if we set it to 1024, we'd be mimicking an improvement found in [*Better plain ViT baselines for ImageNet-1k*](https://arxiv.org/abs/2205.01580), a paper which improves upon the original ViT paper and suggested extra reading). But since we only have ~200 training samples total, we'll stick with 32." |
4376 | 4376 | ]
|
4377 | 4377 | },
|
4378 | 4378 | {
|
|
4649 | 4649 | "\n",
|
4650 | 4650 | "> **Note:** ^ the EffNetB2 model in reference was trained with 20% of pizza, steak and sushi data (double the amount of images) rather than the ViT feature extractor which was trained with 10% of pizza, steak and sushi data. An exercise would be to train the ViT feature extractor model on the same amount of data and see how much the results improve.\n",
|
4651 | 4651 | "\n",
|
4652 |
| - "The EffNetB2 model is ~11x smaller than the ViT model with similiar results for test loss and accuracy.\n", |
| 4652 | + "The EffNetB2 model is ~11x smaller than the ViT model with similar results for test loss and accuracy.\n", |
4653 | 4653 | "\n",
|
4654 | 4654 | "However, the ViT model's results may improve more when trained with the same data (20% pizza, steak and sushi data).\n",
|
4655 | 4655 | "\n",
|
|
0 commit comments