Spaces:

onurcopur
/

tattoo_search_engine

Running

App Files Files Community

tattoo_search_engine / CLAUDE.md

Onur Çopur

add dinov3 and dinov2 with registers

0647d62 about 1 month ago

preview code

raw

history blame contribute delete

7.31 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is an AI-powered tattoo search engine that combines visual similarity search with image captioning. Users upload a tattoo image, and the system finds visually similar tattoos from across the web using multi-model embeddings and multi-platform search.

Tech Stack: FastAPI, PyTorch, HuggingFace Transformers, OpenCLIP, DINOv2, SigLIP

Deployment: Dockerized application designed for HuggingFace Spaces (GPU recommended)

Development Commands

Running the Application

# Local development
python app.py

# Docker build and run
docker build -t tattoo-search .
docker run -p 7860:7860 --env-file .env tattoo-search

Environment Setup

Required environment variable:

HF_TOKEN: HuggingFace API token (required for GLM-4.5V captioning via Novita provider)

Create .env file:

HF_TOKEN=your_token_here

Testing Endpoints

# Health check
curl http://localhost:7860/health

# Get available models
curl http://localhost:7860/models

# Search with image
curl -X POST http://localhost:7860/search \
  -F "[email protected]" \
  -F "embedding_model=clip" \
  -F "include_patch_attention=false"

Architecture

Core Pipeline Flow

Image Upload → FastAPI endpoint (/search in main.py)
Caption Generation → GLM-4.5V via HuggingFace InferenceClient (Novita provider)
Multi-Platform Search → SearchEngineManager coordinates searches across Pinterest, Reddit, Instagram
URL Validation → URLValidator filters valid/accessible images
Embedding Extraction → Selected model (CLIP/DINOv2/SigLIP) encodes query + candidates
Similarity Computation → Cosine similarity ranking in parallel
Optional Patch Analysis → PatchAttentionAnalyzer for detailed visual correspondence

Key Components

main.py - TattooSearchEngine Class

Main orchestration class that ties all components together
generate_caption(): Uses HuggingFace InferenceClient with GLM-4.5V model
search_images(): Delegates to SearchEngineManager with caching
download_and_process_image(): Parallel image download and similarity computation
compute_similarity(): ThreadPoolExecutor for concurrent processing with early stopping

embeddings.py - Model Abstraction

EmbeddingModel: Abstract base class defining interface
CLIPEmbedding: OpenAI CLIP ViT-B/32 (default)
DINOv2Embedding: Meta's self-supervised vision transformer
SigLIPEmbedding: Google's improved CLIP-like model
EmbeddingModelFactory: Factory pattern for model instantiation with fallback
All models support both global image embeddings and patch-level features

search_engines/ - Multi-Platform Search

SearchEngineManager: Coordinates parallel searches across platforms with fallback strategies
BaseSearchEngine: Abstract interface for platform-specific engines
Platform implementations: PinterestSearchEngine, RedditSearchEngine, InstagramSearchEngine
SearchResult and ImageResult: Data classes for structured results
Includes intelligent query simplification for fallback searches

patch_attention.py - Visual Correspondence

PatchAttentionAnalyzer: Computes patch-level attention matrices between images
compute_patch_similarities(): Extracts patch features and computes attention
visualize_attention_heatmap(): Creates matplotlib visualizations as base64 PNG
Returns attention matrices showing which image regions correspond best

utils/ - Supporting Utilities

SearchCache: In-memory LRU cache with TTL for search results
URLValidator: Concurrent URL validation to filter broken/inaccessible images

Model Selection Logic

The search engine supports dynamic model switching via get_search_engine():

Global singleton pattern with lazy initialization
Models are swapped only when a different embedding model is requested
Each model implements both global pooling and patch-level encoding

Search Strategy

SearchEngineManager uses a tiered approach:

Primary platforms (Pinterest, Reddit) searched first
If results < threshold, try additional platforms (Instagram)
If still insufficient, simplify query and retry
All platform searches run concurrently via ThreadPoolExecutor

Caching Strategy

Search results cached by query + max_results hash
Default TTL: 1 hour (3600s)
Max cache size: 1000 entries with LRU eviction
Significantly reduces redundant searches

Important Implementation Details

Caption Generation

Uses GLM-4.5V via HuggingFace InferenceClient with Novita provider
Converts PIL image to base64 data URL
Expects JSON response with "search_query" field
Fallback to "tattoo artwork" on failure

Image Download Headers

Platform-specific headers (Pinterest, Instagram optimizations)
Random user agent rotation
Content-type and size validation (10MB limit, min 50x50px)
Exponential backoff retry mechanism

Similarity Computation

Early stopping optimization: stops at 20 good results (5 if patch attention enabled)
ThreadPoolExecutor with max 10 workers
Rate limiting with 0.1s delays between downloads
Future cancellation after target reached

Patch Attention

Only triggered when include_patch_attention=true
Computes NxM attention matrix (query patches × candidate patches)
Visualizations include: attention heatmap, patch grid overlays, top correspondences
Returns base64-encoded PNG images

API Response Structures

POST /search returns:

{
  "caption": "string",
  "results": [
    {
      "score": 0.95,
      "url": "https://...",
      "patch_attention": {  // optional
        "overall_similarity": 0.87,
        "query_grid_size": 7,
        "candidate_grid_size": 7,
        "attention_summary": {...}
      }
    }
  ],
  "embedding_model": "CLIP-ViT-B-32",
  "patch_attention_enabled": false
}

POST /analyze-attention returns detailed patch analysis with visualizations

Common Development Patterns

Adding a New Embedding Model

Create new class in embeddings.py inheriting from EmbeddingModel
Implement load_model(), encode_image(), encode_image_patches(), get_model_name()
Add to EmbeddingModelFactory.AVAILABLE_MODELS
Add config to get_default_model_configs()

Adding a New Search Platform

Create new engine in search_engines/ inheriting from BaseSearchEngine
Add platform to SearchPlatform enum in base.py
Implement search() and is_valid_url() methods
Add to SearchEngineManager.engines dict
Update platform prioritization in search_with_fallback() if needed

Performance Considerations

GPU acceleration used if available (CUDA)
Concurrent image downloads (ThreadPoolExecutor)
Search result caching to reduce API calls
Early stopping in similarity computation
Future cancellation after targets met
Model instances reused globally to avoid reloading

Deployment Notes

Designed for HuggingFace Spaces with Docker SDK
Port 7860 (HF Spaces default)
Recommended hardware: T4 Small GPU or higher
Health check endpoint at /health for monitoring
All models download on first use and cache in /app/cache