Deepfake Detection with Improved EfficientViT

Model Architecture

Model Architecture

Inference Pipeline

Inference Pipeline

This repository contains a PyTorch model for deepfake detection based on an improved EfficientViT architecture, trained on video data.

The model predicts whether a video is real (0) or fake (1) using both visual information and temporal cues.


🧩 Model Description

Architecture: Improved EfficientViT
Backbone: EfficientNet-B0 for feature extraction
Head: Transformer-based temporal modeling with classification head
Input: Video frames (224Γ—224 RGB images)
Output: Binary label (0=Real, 1=Fake) and frame-level probabilities

Key Features:

  • Extracts faces from frames using MTCNN
  • Supports inference on raw video files
  • Provides frame-level probabilities for fine-grained analysis

πŸ“ Repository Structure

deepfake-efficientvit/
β”‚
β”œβ”€β”€ model.py                  # ImprovedEfficientViT class
β”œβ”€β”€ inference.py              # Functions to run inference on videos
β”œβ”€β”€ model.pth  # Trained weights
β”œβ”€β”€ config.json               # Optional model metadata
β”œβ”€β”€ requirements.txt          # Required packages
β”œβ”€β”€ README.md

⚑ Installation

git clone https://huggingface.co/faisalishfaq2005/deepfake-detection-efficientnet-vit

cd deepfake-detection-efficientnet-vit

pip install -r requirements.txt

πŸš€ Usage

1.Programmatic Inference


from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch
from model import ImprovedEfficientViT
from inference import predict_vedio 

# 1️⃣ Download the checkpoint from Hugging Face
checkpoint_path = hf_hub_download(
    repo_id="faisalishfaq2005/deepfake-detection-efficientnet-vit",  
    filename="model.safetensors"
)

# 2️⃣ Load the model weights safely
state_dict = load_file(checkpoint_path, device="cpu")
model = ImprovedEfficientViT()
model.load_state_dict(state_dict)
model.eval()

# 4️⃣ Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# 3️⃣ Run inference on a video
video_path = "sample_video.mp4"
result = predict_vedio(video_path, model)
print(result)
# Example Output: {'class': 1}

2. Manual Download

Go to the Hugging Face model page

Download:

model.pth

model.py

inference.py

Place them in the same folder locally.

Install requirements and run predict_video().

πŸ“„ License

This model is released under the MIT License. You are free to use, modify, and distribute it, with attribution.

πŸ“š Citation

If you use this model in your research, please cite:

@inproceedings{faisalishfaq2025efficientvit,
  title={Deepfake Detection with Efficientnet and ViT},
  author={Faisal Ishfaq},
  year={2025}
}
Downloads last month
69
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results