YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DeepFake Detector V13 🎯

State-of-the-art deepfake detection ensemble with 699M parameters

Model Parameters F1 Score

πŸš€ Performance Highlights

  • Average Ensemble F1: 0.9313
  • Best Model F1: 0.9586 (Model 13.3 - Swin-Large)
  • Total Parameters: 699M (exceeds 500M requirement βœ…)
  • Training Time: ~6.1 hours on T4 GPU

πŸ“Š Architecture

This model consists of 3 large-scale transformer and CNN models trained sequentially:

Model Backbone Parameters F1 Score Training Time
Model 13.1 ConvNeXt-Large 198M 0.8971 205.7 min
Model 13.2 ViT-Large 304M 0.9382 52.7 min
Model 13.3 Swin-Large 197M 0.9586 106.2 min

Total: 699M parameters

Model Files

  • model_1.safetensors - ConvNeXt-Large (752 MB)
  • model_2.safetensors - ViT-Large (1159 MB)
  • model_3.safetensors - Swin-Large (747 MB)

🎯 Usage

Installation

pip install torch torchvision timm safetensors pillow

Quick Start - Single Model

import torch
import timm
from PIL import Image
from torchvision import transforms
from safetensors.torch import load_file

# Define model architecture
class DeepfakeDetector(torch.nn.Module):
    def __init__(self, backbone_name, dropout=0.3):
        super().__init__()
        self.backbone = timm.create_model(backbone_name, pretrained=False, num_classes=0)
        
        if hasattr(self.backbone, 'num_features'):
            feat_dim = self.backbone.num_features
        else:
            with torch.no_grad():
                feat_dim = self.backbone(torch.randn(1, 3, 224, 224)).shape[1]
        
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(feat_dim, 512),
            torch.nn.BatchNorm1d(512),
            torch.nn.GELU(),
            torch.nn.Dropout(dropout),
            torch.nn.Linear(512, 128),
            torch.nn.BatchNorm1d(128),
            torch.nn.GELU(),
            torch.nn.Dropout(dropout * 0.5),
            torch.nn.Linear(128, 1)
        )
    
    def forward(self, x):
        features = self.backbone(x)
        return self.classifier(features).squeeze(-1)

# Load best model (Model 13.3 - Swin-Large)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = DeepfakeDetector('swin_large_patch4_window7_224', dropout=0.3)
state_dict = load_file('model_3.safetensors')
model.load_state_dict(state_dict)
model = model.to(device)
model.eval()

# Preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Predict
image = Image.open('test_image.jpg').convert('RGB')
input_tensor = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    logits = model(input_tensor)
    probability = torch.sigmoid(logits).item()
    prediction = 'FAKE' if probability > 0.5 else 'REAL'

print(f"Prediction: {prediction}")
print(f"Confidence: {probability:.2%}")

Full Ensemble (Recommended)

import torch
import timm
from PIL import Image
from torchvision import transforms
from safetensors.torch import load_file

class DeepfakeDetector(torch.nn.Module):
    def __init__(self, backbone_name, dropout=0.3):
        super().__init__()
        self.backbone = timm.create_model(backbone_name, pretrained=False, num_classes=0)
        
        if hasattr(self.backbone, 'num_features'):
            feat_dim = self.backbone.num_features
        else:
            with torch.no_grad():
                feat_dim = self.backbone(torch.randn(1, 3, 224, 224)).shape[1]
        
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(feat_dim, 512),
            torch.nn.BatchNorm1d(512),
            torch.nn.GELU(),
            torch.nn.Dropout(dropout),
            torch.nn.Linear(512, 128),
            torch.nn.BatchNorm1d(128),
            torch.nn.GELU(),
            torch.nn.Dropout(dropout * 0.5),
            torch.nn.Linear(128, 1)
        )
    
    def forward(self, x):
        features = self.backbone(x)
        return self.classifier(features).squeeze(-1)

# Model configurations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

configs = [
    ('convnext_large', 0.3, 'model_1.safetensors'),
    ('vit_large_patch16_224', 0.35, 'model_2.safetensors'),
    ('swin_large_patch4_window7_224', 0.3, 'model_3.safetensors')
]

# Load all models
models = []
for backbone, dropout, filename in configs:
    model = DeepfakeDetector(backbone, dropout)
    state_dict = load_file(filename)
    model.load_state_dict(state_dict)
    model = model.to(device)
    model.eval()
    models.append(model)

print(f"βœ“ Loaded {len(models)} models")

# Preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Ensemble prediction
def predict_ensemble(image_path):
    image = Image.open(image_path).convert('RGB')
    input_tensor = transform(image).unsqueeze(0).to(device)
    
    predictions = []
    with torch.no_grad():
        for model in models:
            logits = model(input_tensor)
            prob = torch.sigmoid(logits).item()
            predictions.append(prob)
    
    # Average ensemble
    avg_prob = sum(predictions) / len(predictions)
    prediction = 'FAKE' if avg_prob > 0.5 else 'REAL'
    
    return {
        'prediction': prediction,
        'confidence': avg_prob,
        'individual_predictions': predictions
    }

# Use it
result = predict_ensemble('test_image.jpg')
print(f"Prediction: {result['prediction']}")
print(f"Ensemble Confidence: {result['confidence']:.2%}")
print(f"Individual Models: {[f'{p:.2%}' for p in result['individual_predictions']]}")

πŸ“ˆ Training Details

Architecture Design

Each model uses:

  • Backbone: Large pre-trained vision model (frozen initially, fine-tuned)
  • Classifier Head:
    • Linear(feat_dim β†’ 512) + BatchNorm + GELU + Dropout
    • Linear(512 β†’ 128) + BatchNorm + GELU + Dropout
    • Linear(128 β†’ 1)

Training Configuration

  • Loss Function: Focal Loss with Label Smoothing
    • Alpha: 0.25
    • Gamma: 2.5
    • Label Smoothing: 0.12
  • Optimizer: AdamW
    • Learning Rates: [2e-5, 1.5e-5, 1.8e-5]
    • Weight Decay: 3e-4
  • Scheduler: CosineAnnealingWarmRestarts (T_0=3, T_mult=2)
  • Epochs: 10 per model
  • Batch Sizes: [32, 24, 32]
  • Mixed Precision: FP16 enabled
  • Gradient Accumulation: 4 steps
  • Gradient Checkpointing: Enabled (memory efficiency)

Data Augmentation

  • Random Horizontal Flip (p=0.5)
  • Random Rotation (Β±12Β°)
  • Color Jitter (brightness, contrast, saturation: Β±0.15)
  • Normalization: ImageNet stats

πŸ“Š Performance Analysis

Model Comparison

Model 13.1 (ConvNeXt-Large)

  • βœ“ Solid baseline: F1 = 0.8971
  • βœ“ CNN-based architecture
  • βœ“ Good for local feature extraction

Model 13.2 (ViT-Large)

  • βœ“ Strong performance: F1 = 0.9382
  • βœ“ Fastest training (52.7 min)
  • βœ“ Global attention mechanism

Model 13.3 (Swin-Large) ⭐ Best Model

  • βœ“ Excellent performance: F1 = 0.9586
  • βœ“ Hierarchical vision transformer
  • βœ“ Best balance of accuracy and efficiency

Ensemble Benefits

The ensemble approach provides:

  • Improved Robustness: Different architectures capture different patterns
  • Reduced Variance: Averaging reduces prediction noise
  • Better Generalization: Complementary strengths minimize overfitting
  • Higher Accuracy: Expected ensemble F1 β‰ˆ 0.94-0.96

πŸ”§ System Requirements

Inference (Single Model)

  • GPU: 4GB+ VRAM
  • RAM: 8GB+
  • Storage: ~1.2 GB per model

Inference (Full Ensemble)

  • GPU: 12GB+ VRAM (or run models sequentially on smaller GPU)
  • RAM: 16GB+
  • Storage: ~2.7 GB total

Training

  • GPU: T4 (16GB) or better
  • RAM: 12GB+
  • Storage: 8GB+ for checkpoints

πŸ“š Dataset

Trained on: ash12321/deepfake-v13-dataset

πŸ”— Related Models

πŸ“„ Citation

@model{v13-deepfake-detector,
  title={DeepFake Detector V13: Large-Scale Ensemble},
  author={Ash},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/ash12321/deepfake-detector-v13}}
}

πŸ“ License

MIT License - See LICENSE file for details

πŸ™ Acknowledgments

  • Built with PyTorch, timm, and Hugging Face
  • Trained on Google Colab T4 GPU
  • Architectures: ConvNeXt (Meta), ViT (Google), Swin (Microsoft)

Model Version: 13.0
Last Updated: November 2024
Status: Production Ready βœ…

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support