Onur Çopur
add dinov3 and dinov2 with registers
0647d62

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is an AI-powered tattoo search engine that combines visual similarity search with image captioning. Users upload a tattoo image, and the system finds visually similar tattoos from across the web using multi-model embeddings and multi-platform search.

Tech Stack: FastAPI, PyTorch, HuggingFace Transformers, OpenCLIP, DINOv2, SigLIP

Deployment: Dockerized application designed for HuggingFace Spaces (GPU recommended)

Development Commands

Running the Application

# Local development
python app.py

# Docker build and run
docker build -t tattoo-search .
docker run -p 7860:7860 --env-file .env tattoo-search

Environment Setup

Required environment variable:

  • HF_TOKEN: HuggingFace API token (required for GLM-4.5V captioning via Novita provider)

Create .env file:

HF_TOKEN=your_token_here

Testing Endpoints

# Health check
curl http://localhost:7860/health

# Get available models
curl http://localhost:7860/models

# Search with image
curl -X POST http://localhost:7860/search \
  -F "[email protected]" \
  -F "embedding_model=clip" \
  -F "include_patch_attention=false"

Architecture

Core Pipeline Flow

  1. Image Upload → FastAPI endpoint (/search in main.py)
  2. Caption Generation → GLM-4.5V via HuggingFace InferenceClient (Novita provider)
  3. Multi-Platform Search → SearchEngineManager coordinates searches across Pinterest, Reddit, Instagram
  4. URL Validation → URLValidator filters valid/accessible images
  5. Embedding Extraction → Selected model (CLIP/DINOv2/SigLIP) encodes query + candidates
  6. Similarity Computation → Cosine similarity ranking in parallel
  7. Optional Patch Analysis → PatchAttentionAnalyzer for detailed visual correspondence

Key Components

main.py - TattooSearchEngine Class

  • Main orchestration class that ties all components together
  • generate_caption(): Uses HuggingFace InferenceClient with GLM-4.5V model
  • search_images(): Delegates to SearchEngineManager with caching
  • download_and_process_image(): Parallel image download and similarity computation
  • compute_similarity(): ThreadPoolExecutor for concurrent processing with early stopping

embeddings.py - Model Abstraction

  • EmbeddingModel: Abstract base class defining interface
  • CLIPEmbedding: OpenAI CLIP ViT-B/32 (default)
  • DINOv2Embedding: Meta's self-supervised vision transformer
  • SigLIPEmbedding: Google's improved CLIP-like model
  • EmbeddingModelFactory: Factory pattern for model instantiation with fallback
  • All models support both global image embeddings and patch-level features

search_engines/ - Multi-Platform Search

  • SearchEngineManager: Coordinates parallel searches across platforms with fallback strategies
  • BaseSearchEngine: Abstract interface for platform-specific engines
  • Platform implementations: PinterestSearchEngine, RedditSearchEngine, InstagramSearchEngine
  • SearchResult and ImageResult: Data classes for structured results
  • Includes intelligent query simplification for fallback searches

patch_attention.py - Visual Correspondence

  • PatchAttentionAnalyzer: Computes patch-level attention matrices between images
  • compute_patch_similarities(): Extracts patch features and computes attention
  • visualize_attention_heatmap(): Creates matplotlib visualizations as base64 PNG
  • Returns attention matrices showing which image regions correspond best

utils/ - Supporting Utilities

  • SearchCache: In-memory LRU cache with TTL for search results
  • URLValidator: Concurrent URL validation to filter broken/inaccessible images

Model Selection Logic

The search engine supports dynamic model switching via get_search_engine():

  • Global singleton pattern with lazy initialization
  • Models are swapped only when a different embedding model is requested
  • Each model implements both global pooling and patch-level encoding

Search Strategy

SearchEngineManager uses a tiered approach:

  1. Primary platforms (Pinterest, Reddit) searched first
  2. If results < threshold, try additional platforms (Instagram)
  3. If still insufficient, simplify query and retry
  4. All platform searches run concurrently via ThreadPoolExecutor

Caching Strategy

  • Search results cached by query + max_results hash
  • Default TTL: 1 hour (3600s)
  • Max cache size: 1000 entries with LRU eviction
  • Significantly reduces redundant searches

Important Implementation Details

Caption Generation

  • Uses GLM-4.5V via HuggingFace InferenceClient with Novita provider
  • Converts PIL image to base64 data URL
  • Expects JSON response with "search_query" field
  • Fallback to "tattoo artwork" on failure

Image Download Headers

  • Platform-specific headers (Pinterest, Instagram optimizations)
  • Random user agent rotation
  • Content-type and size validation (10MB limit, min 50x50px)
  • Exponential backoff retry mechanism

Similarity Computation

  • Early stopping optimization: stops at 20 good results (5 if patch attention enabled)
  • ThreadPoolExecutor with max 10 workers
  • Rate limiting with 0.1s delays between downloads
  • Future cancellation after target reached

Patch Attention

  • Only triggered when include_patch_attention=true
  • Computes NxM attention matrix (query patches × candidate patches)
  • Visualizations include: attention heatmap, patch grid overlays, top correspondences
  • Returns base64-encoded PNG images

API Response Structures

POST /search returns:

{
  "caption": "string",
  "results": [
    {
      "score": 0.95,
      "url": "https://...",
      "patch_attention": {  // optional
        "overall_similarity": 0.87,
        "query_grid_size": 7,
        "candidate_grid_size": 7,
        "attention_summary": {...}
      }
    }
  ],
  "embedding_model": "CLIP-ViT-B-32",
  "patch_attention_enabled": false
}

POST /analyze-attention returns detailed patch analysis with visualizations

Common Development Patterns

Adding a New Embedding Model

  1. Create new class in embeddings.py inheriting from EmbeddingModel
  2. Implement load_model(), encode_image(), encode_image_patches(), get_model_name()
  3. Add to EmbeddingModelFactory.AVAILABLE_MODELS
  4. Add config to get_default_model_configs()

Adding a New Search Platform

  1. Create new engine in search_engines/ inheriting from BaseSearchEngine
  2. Add platform to SearchPlatform enum in base.py
  3. Implement search() and is_valid_url() methods
  4. Add to SearchEngineManager.engines dict
  5. Update platform prioritization in search_with_fallback() if needed

Performance Considerations

  • GPU acceleration used if available (CUDA)
  • Concurrent image downloads (ThreadPoolExecutor)
  • Search result caching to reduce API calls
  • Early stopping in similarity computation
  • Future cancellation after targets met
  • Model instances reused globally to avoid reloading

Deployment Notes

  • Designed for HuggingFace Spaces with Docker SDK
  • Port 7860 (HF Spaces default)
  • Recommended hardware: T4 Small GPU or higher
  • Health check endpoint at /health for monitoring
  • All models download on first use and cache in /app/cache