Spaces:
Running
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Overview
This is an AI-powered tattoo search engine that combines visual similarity search with image captioning. Users upload a tattoo image, and the system finds visually similar tattoos from across the web using multi-model embeddings and multi-platform search.
Tech Stack: FastAPI, PyTorch, HuggingFace Transformers, OpenCLIP, DINOv2, SigLIP
Deployment: Dockerized application designed for HuggingFace Spaces (GPU recommended)
Development Commands
Running the Application
# Local development
python app.py
# Docker build and run
docker build -t tattoo-search .
docker run -p 7860:7860 --env-file .env tattoo-search
Environment Setup
Required environment variable:
HF_TOKEN: HuggingFace API token (required for GLM-4.5V captioning via Novita provider)
Create .env file:
HF_TOKEN=your_token_here
Testing Endpoints
# Health check
curl http://localhost:7860/health
# Get available models
curl http://localhost:7860/models
# Search with image
curl -X POST http://localhost:7860/search \
-F "[email protected]" \
-F "embedding_model=clip" \
-F "include_patch_attention=false"
Architecture
Core Pipeline Flow
- Image Upload → FastAPI endpoint (
/searchin main.py) - Caption Generation → GLM-4.5V via HuggingFace InferenceClient (Novita provider)
- Multi-Platform Search → SearchEngineManager coordinates searches across Pinterest, Reddit, Instagram
- URL Validation → URLValidator filters valid/accessible images
- Embedding Extraction → Selected model (CLIP/DINOv2/SigLIP) encodes query + candidates
- Similarity Computation → Cosine similarity ranking in parallel
- Optional Patch Analysis → PatchAttentionAnalyzer for detailed visual correspondence
Key Components
main.py - TattooSearchEngine Class
- Main orchestration class that ties all components together
generate_caption(): Uses HuggingFace InferenceClient with GLM-4.5V modelsearch_images(): Delegates to SearchEngineManager with cachingdownload_and_process_image(): Parallel image download and similarity computationcompute_similarity(): ThreadPoolExecutor for concurrent processing with early stopping
embeddings.py - Model Abstraction
EmbeddingModel: Abstract base class defining interfaceCLIPEmbedding: OpenAI CLIP ViT-B/32 (default)DINOv2Embedding: Meta's self-supervised vision transformerSigLIPEmbedding: Google's improved CLIP-like modelEmbeddingModelFactory: Factory pattern for model instantiation with fallback- All models support both global image embeddings and patch-level features
search_engines/ - Multi-Platform Search
SearchEngineManager: Coordinates parallel searches across platforms with fallback strategiesBaseSearchEngine: Abstract interface for platform-specific engines- Platform implementations: PinterestSearchEngine, RedditSearchEngine, InstagramSearchEngine
SearchResultandImageResult: Data classes for structured results- Includes intelligent query simplification for fallback searches
patch_attention.py - Visual Correspondence
PatchAttentionAnalyzer: Computes patch-level attention matrices between imagescompute_patch_similarities(): Extracts patch features and computes attentionvisualize_attention_heatmap(): Creates matplotlib visualizations as base64 PNG- Returns attention matrices showing which image regions correspond best
utils/ - Supporting Utilities
SearchCache: In-memory LRU cache with TTL for search resultsURLValidator: Concurrent URL validation to filter broken/inaccessible images
Model Selection Logic
The search engine supports dynamic model switching via get_search_engine():
- Global singleton pattern with lazy initialization
- Models are swapped only when a different embedding model is requested
- Each model implements both global pooling and patch-level encoding
Search Strategy
SearchEngineManager uses a tiered approach:
- Primary platforms (Pinterest, Reddit) searched first
- If results < threshold, try additional platforms (Instagram)
- If still insufficient, simplify query and retry
- All platform searches run concurrently via ThreadPoolExecutor
Caching Strategy
- Search results cached by query + max_results hash
- Default TTL: 1 hour (3600s)
- Max cache size: 1000 entries with LRU eviction
- Significantly reduces redundant searches
Important Implementation Details
Caption Generation
- Uses GLM-4.5V via HuggingFace InferenceClient with Novita provider
- Converts PIL image to base64 data URL
- Expects JSON response with "search_query" field
- Fallback to "tattoo artwork" on failure
Image Download Headers
- Platform-specific headers (Pinterest, Instagram optimizations)
- Random user agent rotation
- Content-type and size validation (10MB limit, min 50x50px)
- Exponential backoff retry mechanism
Similarity Computation
- Early stopping optimization: stops at 20 good results (5 if patch attention enabled)
- ThreadPoolExecutor with max 10 workers
- Rate limiting with 0.1s delays between downloads
- Future cancellation after target reached
Patch Attention
- Only triggered when
include_patch_attention=true - Computes NxM attention matrix (query patches × candidate patches)
- Visualizations include: attention heatmap, patch grid overlays, top correspondences
- Returns base64-encoded PNG images
API Response Structures
POST /search returns:
{
"caption": "string",
"results": [
{
"score": 0.95,
"url": "https://...",
"patch_attention": { // optional
"overall_similarity": 0.87,
"query_grid_size": 7,
"candidate_grid_size": 7,
"attention_summary": {...}
}
}
],
"embedding_model": "CLIP-ViT-B-32",
"patch_attention_enabled": false
}
POST /analyze-attention returns detailed patch analysis with visualizations
Common Development Patterns
Adding a New Embedding Model
- Create new class in
embeddings.pyinheriting fromEmbeddingModel - Implement
load_model(),encode_image(),encode_image_patches(),get_model_name() - Add to
EmbeddingModelFactory.AVAILABLE_MODELS - Add config to
get_default_model_configs()
Adding a New Search Platform
- Create new engine in
search_engines/inheriting fromBaseSearchEngine - Add platform to
SearchPlatformenum inbase.py - Implement
search()andis_valid_url()methods - Add to
SearchEngineManager.enginesdict - Update platform prioritization in
search_with_fallback()if needed
Performance Considerations
- GPU acceleration used if available (CUDA)
- Concurrent image downloads (ThreadPoolExecutor)
- Search result caching to reduce API calls
- Early stopping in similarity computation
- Future cancellation after targets met
- Model instances reused globally to avoid reloading
Deployment Notes
- Designed for HuggingFace Spaces with Docker SDK
- Port 7860 (HF Spaces default)
- Recommended hardware: T4 Small GPU or higher
- Health check endpoint at
/healthfor monitoring - All models download on first use and cache in
/app/cache