SarcasmDiffusion — SDXL Fused Meme Generator
Model type: Stable Diffusion XL (Base 1.0) fine‑tuned via LoRA (merged/fused) to learn the visual style of sarcastic/ironic memes.
Author: Ricardo Urdaneta (github.com/Ricardouchub)
Overview
SarcasmDiffusion is a diffusion-based generative model focused on producing clean meme-style photographs that are suitable for caption overlays (text is added after generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the Hateful Memes dataset to capture stylistic cues of humorous/ironic memes while avoiding offensive content.
- Base:
stabilityai/stable-diffusion-xl-base-1.0 - Fine‑tuning: LoRA on the UNet only; VAE and text encoders are frozen.
- Exported artifact: Fused SDXL (no external LoRA required at inference).
This model focuses on style transfer for meme aesthetics (composition, lighting, “stock-photo vibe”), not on rendering text inside images. Add titles/subtitles with your own overlay function or editor.
Intended Use
- Generating meme-ready images with space at the top/bottom for captions.
- Creative exploration of humorous/ironic visual setups controlled by prompts.
- Educational/portfolio use for LoRA fine‑tuning workflows with SDXL.
Out of Scope / Limitations
- No text rendering inside the image (explicitly discouraged via negative prompts).
- May produce stock-like aesthetics by design.
- Not suitable for generating or amplifying harmful, hateful, or NSFW content.
- As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs.
Training Summary
- Base model: SDXL Base 1.0
- LoRA rank / alpha / dropout:
r=8,alpha=16,dropout=0.05 - Resolution: 1024 (training); common inference at 768–896 for speed
- Batch: 1 (gradient accumulation = 4)
- Steps: ~9k (≈2 epoch on ~5k images)
- Learning Rate: 0.0001
- Precision: fp16 (LoRA params kept in fp32 during training)
- Optimizer: AdamW
- Scheduler: cosine with warmup (recommended)
- Frozen: VAE, text_encoder, text_encoder_2
Data
- Source: Hateful Memes (Facebook AI).
- We excluded labeled hateful samples and applied NLP enrichment:
- Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony).
- Heuristics + percentiles → tones:
humor / irony / neutral.
- Final training CSV: prompts balanced by tone; negative prompts to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content.
The dataset is not included here. Please obtain Hateful Memes under its original terms and reproduce the preprocessing if needed.
Safety, Ethics & Mitigations
- Hateful labels were filtered out negative prompts is used to avoid NSFW/hate/text overlays.
- Despite mitigations, misuse is possible. Users are responsible for prompting responsibly and complying with local laws and platform policies.
- Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery.
Known risks: dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts.
How to Use
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"Ricardouchub/SarcasmDiffusion",
torch_dtype=torch.float16
).to("cuda") # use "cpu" if no GPU
prompt = (
"sarcastic meme about checking the fridge for the third time, "
"centered subject, plain background, high-contrast photo, stock photo style"
)
negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay"
g = torch.Generator(device=pipe.device).manual_seed(123)
image = pipe(prompt,
negative_prompt=negative,
num_inference_steps=22,
guidance_scale=6.3,
width=896, height=896,
generator=g).images[0]
image.save("sample.png")
Prompting Tips
- Add layout hints: “centered subject”, “plain background”, “space at top and bottom”.
- Keep negative prompts to avoid logos/text/NSFW.
- Use seeds for reproducibility;
steps=18–28,guidance=5.5–7.5,size=768–1024.
Environment & Compatibility
To ensure full compatibility when loading this model (fused SDXL with LoRA merged), use the following library versions:
| Library | Recommended Version | Notes |
|---|---|---|
| Python | 3.10 – 3.12 | Tested on Colab (Python 3.12) |
| PyTorch | 2.6.0 + CUDA 12.4 | Any CUDA ≥ 12 works |
| diffusers | 0.35.1 | Core inference & model loading |
| transformers | 4.45.2 | Required for SDXL CLIPTextEncoder compatibility |
| accelerate | 1.10.1 | Device and fp16 inference management |
| huggingface_hub | 0.23.5 | Compatible with diffusers 0.35.x |
| safetensors | ≥ 0.4.5 | For secure model weights loading |
Install in Colab or local environment:
pip install "diffusers==0.35.1" "transformers==4.45.2" "accelerate==1.10.1" "huggingface_hub==0.23.5" safetensors
Important:
Using newer versions (e.g.,transformers ≥ 4.56) may break compatibility due to API changes inCLIPTextModel(offload_state_dictargument).
Always match the versions above for smooth loading.
License
- Code: MIT
- Model weights: follow the base model’s license (Stability AI / SDXL Base 1.0).
- Data: Users must obtain Hateful Memes from its source and agree to its terms.
By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others.
Evaluation
Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies.
Acknowledgments
- Stability AI — SDXL Base 1.0
- Hugging Face — Diffusers, Accelerate, PEFT
- Facebook AI — Hateful Memes dataset
- Downloads last month
- 7
Model tree for Ricardouchub/SarcasmDiffusion
Base model
stabilityai/stable-diffusion-xl-base-1.0