This is the official repository of Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors, presented at IWBF2025 and available on IEEE Xplore.
The trained models are available at the following OneDrive folder.
We made available for reserach purposes only our novel FOWS dataset. You can request access to our dataset by filling this Google form.
- install miniconda
- create the 'fows' environment with python 3.10
conda create -n fows python=3.10 conda activate fows
- clone the repository and install the requirements
# clone project (NOTE: update link) git clone https://github.com/RickyZi/FOWS_test.git # install project cd FOWS_test # activate the conda env conda activate fows # install the requirements pip install -r requirements.txt
A simple demo explaining the whole pipeline of the project is available in Colab. You can use this demo to test the pre-trained models on the FOWS dataset or on your own videos.
Our FOWS dataset consists in a collection of original and manipulate videos of user performing actions that occlude portion of their face. In order to train the models, we extracted the user faces from the video and organize them as 'occluded' and 'non-occluded'. For ease of reproduction, we also made available the preprocessed version of the FOWS dataset. You can access it by filling out this Google form.
You can replicate this preprocessing by using the scripts available in the ./preprocessing/ folder:
- frames_and_faces_extraction.py will apply mediapipe's Blaze Face detector to detect and extract the faces from the video,
- fows_dataset_processing.py will organize the images into 'occluded' and 'non-occluded' faces. Please note that a manual revision of the results may be needed in this case.
In our work we applied the same frame categorization preprocessing to the GOTCHA dataset using the ./preprocessing/gotcha_dataset_preprocessing.py script to organize occluded and non-occluded faces.
The same preprocessing applied to our FOWS dataset can be replicated to preprocess your own videos to be tested with our pre-trained models.
The code for training the models presented in the paper is provided in train.py.
You can train a specific model using the following command:
python train.py --model mnetv2 --train_dataset fows_occ --ft --tags mnetv2_fows_occ_FT
- model: defines the model backbone used for training
- MobileNetV2 (mnetv2)
- EfficientNetB4 (effnetb4)
- XceptionNet (xception)
- train_dataset: the dataset used for training (fows_occ, fows_no_occ)
- ft (or tl): the model training strategy
- ft: Fine-Tuning
- tl: Transfer Learning
- tags: defines the name of the folder where the model weights and the training logs will be saved
The code to perform inference of the trained models on a specific test_dataset is provided in test.py.
You can test a trained model on a specific dataset with the following command:
python test.py --model mnetv2 --train_dataset fows_occ --test_dataset fows_no_occ --tl --tags MnetV2_fows_occ_TL_vs_fows_no_occ
- model: name of the pre-trained model to use
- train_dataset: the dataset used for training the model (fows_occ, fows_no_occ)
- test_dataset: the dataset used for testing the model (fows_occ, fows_no_occ)
- ft (or tl): the model training strategy
- ft: Fine-Tuning
- tl: Transfer Learning --tags: the name of the folder where the model inference results and logs will be saved
We also provide the code for computing GradCam activations for a given dataset in the gradcam.py script.
Example usage:
python gradcam.py --model mnetv2 --train_dataset fows_occ --test_dataset fows_no_occ --ft --cam_method gradcam++ --num-layers 1 --tags mnetv2_fows_occ_FT_vs_fows_no_occ
- model: name of the pre-trained model to use
- train_dataset: dataset used when training the model (used to select the pre-trained model)
- test_dataset: dataset used for computing GradCam activations (i.e. a random subset of the dataset)
- ft (or tl): training strategy
- ft: Fine Tuning
- tl: Transfer Learning
- cam_method: which gradcam method to apply (gradcam, gradcam++, eigencam, scorecam)
- num_layers (1,2, or 3): how many layers to use for computing the gradcam output. One layer referes to the last convolutional layer of the model. More than one layer and the gradcam activations will be computed as the average of the layers activation.
- tags: the name of the folder where to save the gradcam activations
If your research uses part of our dataset, models and code, partially or in full, please cite:
@INPROCEEDINGS{11113429,
author={Ziglio, Riccardo and Pasquini, Cecilia and Ranise, Silvio},
booktitle={2025 13th International Workshop on Biometrics and Forensics (IWBF)},
title={Spotting Tell-Tale Visual Artifacts in Face Swapping Videos: Strengths and Pitfalls of CNN Detectors},
year={2025},
volume={},
number={},
pages={01-06},
keywords={Biometrics;Visualization;Forensics;Soft sensors;Conferences;Detectors;Real-time systems;Data models;Faces;Videos;face swapping;face verification;remote video calls;forensic detection},
doi={10.1109/IWBF63717.2025.11113429}
}