faisalishfaq2005's picture
updated readme
b8dafec
metadata
language: en
library_name: pytorch
license: mit
tags:
  - deepfake-detection
  - image-classification
  - video-analysis
  - efficientvit
  - pytorch
pipeline_tag: image-classification
safetensors:
  total: 1
  format: safetensors
  weight_dtype: float32
  size_in_bytes: 80000000
model-index:
  - name: Deepfake Detection with Improved EfficientViT
    results:
      - task:
          type: image-classification
          name: Deepfake Detection
        dataset:
          type: custom
          name: FaceForensics++,Celeb-DF
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.8864
          - name: Precision
            type: precision
            value: 0.892
          - name: Recall
            type: recall
            value: 0.8792
          - name: F1-score
            type: f1
            value: 0.8856
    config: config.json
    metadata:
      model_type: EfficientViT
      num_parameters: 20026725
      precision: float32
      framework: pytorch
      license: mit
      model_format: safetensors
      size: 82MB

Deepfake Detection with Improved EfficientViT

Model Architecture

Model Architecture

Inference Pipeline

Inference Pipeline

This repository contains a PyTorch model for deepfake detection based on an improved EfficientViT architecture, trained on video data.

The model predicts whether a video is real (0) or fake (1) using both visual information and temporal cues.


🧩 Model Description

Architecture: Improved EfficientViT
Backbone: EfficientNet-B0 for feature extraction
Head: Transformer-based temporal modeling with classification head
Input: Video frames (224Γ—224 RGB images)
Output: Binary label (0=Real, 1=Fake) and frame-level probabilities

Key Features:

  • Extracts faces from frames using MTCNN
  • Supports inference on raw video files
  • Provides frame-level probabilities for fine-grained analysis

πŸ“ Repository Structure

deepfake-efficientvit/
β”‚
β”œβ”€β”€ model.py                  # ImprovedEfficientViT class
β”œβ”€β”€ inference.py              # Functions to run inference on videos
β”œβ”€β”€ model.pth  # Trained weights
β”œβ”€β”€ config.json               # Optional model metadata
β”œβ”€β”€ requirements.txt          # Required packages
β”œβ”€β”€ README.md

⚑ Installation

git clone https://huggingface.co/faisalishfaq2005/deepfake-detection-efficientnet-vit

cd deepfake-detection-efficientnet-vit

pip install -r requirements.txt

πŸš€ Usage

1.Programmatic Inference


from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch
from model import ImprovedEfficientViT
from inference import predict_vedio 

# 1️⃣ Download the checkpoint from Hugging Face
checkpoint_path = hf_hub_download(
    repo_id="faisalishfaq2005/deepfake-detection-efficientnet-vit",  
    filename="model.safetensors"
)

# 2️⃣ Load the model weights safely
state_dict = load_file(checkpoint_path, device="cpu")
model = ImprovedEfficientViT()
model.load_state_dict(state_dict)
model.eval()

# 4️⃣ Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# 3️⃣ Run inference on a video
video_path = "sample_video.mp4"
result = predict_vedio(video_path, model)
print(result)
# Example Output: {'class': 1}

2. Manual Download

Go to the Hugging Face model page

Download:

model.pth

model.py

inference.py

Place them in the same folder locally.

Install requirements and run predict_video().

πŸ“„ License

This model is released under the MIT License. You are free to use, modify, and distribute it, with attribution.

πŸ“š Citation

If you use this model in your research, please cite:

@inproceedings{faisalishfaq2025efficientvit,
  title={Deepfake Detection with Efficientnet and ViT},
  author={Faisal Ishfaq},
  year={2025}
}