Deepfake Detection with Improved EfficientViT
Model Architecture
Inference Pipeline
This repository contains a PyTorch model for deepfake detection based on an improved EfficientViT architecture, trained on video data.
The model predicts whether a video is real (0) or fake (1) using both visual information and temporal cues.
π§© Model Description
Architecture: Improved EfficientViT
Backbone: EfficientNet-B0 for feature extraction
Head: Transformer-based temporal modeling with classification head
Input: Video frames (224Γ224 RGB images)
Output: Binary label (0=Real, 1=Fake) and frame-level probabilities  
Key Features:
- Extracts faces from frames using MTCNN
- Supports inference on raw video files
- Provides frame-level probabilities for fine-grained analysis
π Repository Structure
deepfake-efficientvit/
β
βββ model.py                  # ImprovedEfficientViT class
βββ inference.py              # Functions to run inference on videos
βββ model.pth  # Trained weights
βββ config.json               # Optional model metadata
βββ requirements.txt          # Required packages
βββ README.md
β‘ Installation
git clone https://huggingface.co/faisalishfaq2005/deepfake-detection-efficientnet-vit
cd deepfake-detection-efficientnet-vit
pip install -r requirements.txt
π Usage
1.Programmatic Inference
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch
from model import ImprovedEfficientViT
from inference import predict_vedio 
# 1οΈβ£ Download the checkpoint from Hugging Face
checkpoint_path = hf_hub_download(
    repo_id="faisalishfaq2005/deepfake-detection-efficientnet-vit",  
    filename="model.safetensors"
)
# 2οΈβ£ Load the model weights safely
state_dict = load_file(checkpoint_path, device="cpu")
model = ImprovedEfficientViT()
model.load_state_dict(state_dict)
model.eval()
# 4οΈβ£ Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# 3οΈβ£ Run inference on a video
video_path = "sample_video.mp4"
result = predict_vedio(video_path, model)
print(result)
# Example Output: {'class': 1}
2. Manual Download
Go to the Hugging Face model page
Download:
model.pth
model.py
inference.py
Place them in the same folder locally.
Install requirements and run predict_video().
π License
This model is released under the MIT License. You are free to use, modify, and distribute it, with attribution.
π Citation
If you use this model in your research, please cite:
@inproceedings{faisalishfaq2025efficientvit,
  title={Deepfake Detection with Efficientnet and ViT},
  author={Faisal Ishfaq},
  year={2025}
}
- Downloads last month
- 69
Evaluation results
- Accuracy on FaceForensics++,Celeb-DFself-reported0.886
- Precision on FaceForensics++,Celeb-DFself-reported0.892
- Recall on FaceForensics++,Celeb-DFself-reported0.879
- F1-score on FaceForensics++,Celeb-DFself-reported0.886


