DOI Zenodo badge:
Citation
Hamadani, A., Kumari, R., Simon, W., Yadav, G., & Murray-Rust, P. (2025). Exploring Vision Transformers in Practice (0.1). Zenodo. https://doi.org/10.5281/zenodo.16734915
Description:
In this notebook, we fine-tune a pretrained ViT on the Fashion Products Small dataset from Hugging Face, which contains 42,700 e-commerce images of apparel and accessories (e.g., shirts, watches) along with metadata. The data is split into training, validation, and testing. Rather than training from scratch, the notebook uses ViT-Base-Patch16-224-in21k from Hugging Face’s Transformers library.
Vision Transformers in Academic Research
- Understanding Visual Texts
- Analyzing Literature Reviews
- Multimodal Literature Review
- Sorting Large Collections
- Symbolism Analysis in Art
- Visual Semantic Mapping
Reviewers & review process: <Add reviewers and review process link>
Software citation information: CITATION.cff
License: Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ | License information: LICENSE