SarcasmDiffusion — SDXL Fused Meme Generator

Model type: Stable Diffusion XL (Base 1.0) fine‑tuned via LoRA (merged/fused) to learn the visual style of sarcastic/ironic memes.
Author: Ricardo Urdaneta (github.com/Ricardouchub)


Overview

SarcasmDiffusion is a diffusion-based generative model focused on producing clean meme-style photographs that are suitable for caption overlays (text is added after generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the Hateful Memes dataset to capture stylistic cues of humorous/ironic memes while avoiding offensive content.

  • Base: stabilityai/stable-diffusion-xl-base-1.0
  • Fine‑tuning: LoRA on the UNet only; VAE and text encoders are frozen.
  • Exported artifact: Fused SDXL (no external LoRA required at inference).

This model focuses on style transfer for meme aesthetics (composition, lighting, “stock-photo vibe”), not on rendering text inside images. Add titles/subtitles with your own overlay function or editor.


Intended Use

  • Generating meme-ready images with space at the top/bottom for captions.
  • Creative exploration of humorous/ironic visual setups controlled by prompts.
  • Educational/portfolio use for LoRA fine‑tuning workflows with SDXL.

Out of Scope / Limitations

  • No text rendering inside the image (explicitly discouraged via negative prompts).
  • May produce stock-like aesthetics by design.
  • Not suitable for generating or amplifying harmful, hateful, or NSFW content.
  • As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs.

Training Summary

  • Base model: SDXL Base 1.0
  • LoRA rank / alpha / dropout: r=8, alpha=16, dropout=0.05
  • Resolution: 1024 (training); common inference at 768–896 for speed
  • Batch: 1 (gradient accumulation = 4)
  • Steps: ~9k (≈2 epoch on ~5k images)
  • Learning Rate: 0.0001
  • Precision: fp16 (LoRA params kept in fp32 during training)
  • Optimizer: AdamW
  • Scheduler: cosine with warmup (recommended)
  • Frozen: VAE, text_encoder, text_encoder_2

Data

  • Source: Hateful Memes (Facebook AI).
  • We excluded labeled hateful samples and applied NLP enrichment:
    • Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony).
    • Heuristics + percentiles → tones: humor / irony / neutral.
  • Final training CSV: prompts balanced by tone; negative prompts to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content.

The dataset is not included here. Please obtain Hateful Memes under its original terms and reproduce the preprocessing if needed.


Safety, Ethics & Mitigations

  • Hateful labels were filtered out negative prompts is used to avoid NSFW/hate/text overlays.
  • Despite mitigations, misuse is possible. Users are responsible for prompting responsibly and complying with local laws and platform policies.
  • Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery.

Known risks: dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts.


How to Use

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained(
    "Ricardouchub/SarcasmDiffusion",
    torch_dtype=torch.float16
).to("cuda")  # use "cpu" if no GPU

prompt = (
    "sarcastic meme about checking the fridge for the third time, "
    "centered subject, plain background, high-contrast photo, stock photo style"
)
negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay"

g = torch.Generator(device=pipe.device).manual_seed(123)
image = pipe(prompt,
             negative_prompt=negative,
             num_inference_steps=22,
             guidance_scale=6.3,
             width=896, height=896,
             generator=g).images[0]

image.save("sample.png")

Prompting Tips

  • Add layout hints: “centered subject”, “plain background”, “space at top and bottom”.
  • Keep negative prompts to avoid logos/text/NSFW.
  • Use seeds for reproducibility; steps=18–28, guidance=5.5–7.5, size=768–1024.

Environment & Compatibility

To ensure full compatibility when loading this model (fused SDXL with LoRA merged), use the following library versions:

Library Recommended Version Notes
Python 3.10 – 3.12 Tested on Colab (Python 3.12)
PyTorch 2.6.0 + CUDA 12.4 Any CUDA ≥ 12 works
diffusers 0.35.1 Core inference & model loading
transformers 4.45.2 Required for SDXL CLIPTextEncoder compatibility
accelerate 1.10.1 Device and fp16 inference management
huggingface_hub 0.23.5 Compatible with diffusers 0.35.x
safetensors ≥ 0.4.5 For secure model weights loading

Install in Colab or local environment:

pip install   "diffusers==0.35.1"   "transformers==4.45.2"   "accelerate==1.10.1"   "huggingface_hub==0.23.5"   safetensors

Important:
Using newer versions (e.g., transformers ≥ 4.56) may break compatibility due to API changes in CLIPTextModel (offload_state_dict argument).
Always match the versions above for smooth loading.


License

  • Code: MIT
  • Model weights: follow the base model’s license (Stability AI / SDXL Base 1.0).
  • Data: Users must obtain Hateful Memes from its source and agree to its terms.

By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others.


Evaluation

Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies.


Acknowledgments

  • Stability AI — SDXL Base 1.0
  • Hugging Face — Diffusers, Accelerate, PEFT
  • Facebook AI — Hateful Memes dataset
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ricardouchub/SarcasmDiffusion

Finetuned
(1232)
this model