Vestir Clothing Similarity Model v2
MobileNetV3-Small backbone producing 128-dim L2-normalized embeddings for clothing visual similarity.
Model Details
| Property | Value |
|---|---|
| Architecture | MobileNetV3-Small + embedding head |
| Parameters | 1.1M |
| Embedding dim | 128 |
| Input size | 224x224 RGB |
| ONNX INT8 size | 1.2 MB |
| ONNX FP32 size | 4.2 MB |
Training (v2)
Key improvements over v1:
- Triplet loss (anchor/positive/negative) instead of contrastive loss
- Hard negative mining - 80% of negatives are same-color different-subcategory pairs
- Strong color jitter (0.6 brightness/contrast/saturation) forces shape learning over color
- Subcategory-level labels instead of broad master categories
Dataset: ashraq/fashion-product-images-small with 5000 training and 1000 validation images.
Training config:
- Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
- Loss: TripletMarginLoss (margin=0.5)
- Augmentation: RandomCrop, HorizontalFlip, ColorJitter(0.8), RandomGrayscale(0.1)
- Hardware: NVIDIA GTX 1050 Ti (4GB VRAM)
Metrics
v2 (current)
| Metric | Hard-negative eval | Standard eval |
|---|---|---|
| Accuracy | 88.9% | 92.4% |
| F1 | 0.897 | 0.948 |
| Precision | 0.837 | 0.945 |
| Optimal threshold | 0.78 | 0.65 |
v1 (previous)
| Metric | Hard-negative eval | Standard eval |
|---|---|---|
| Accuracy | 82.5% | 95.1% |
| F1 | 0.838 | 0.967 |
| diff_subcat_same_color acc | 6.2% | N/A |
v2 dramatically improves on the hardest cases (same color, different garment type) at the cost of some easy-case accuracy.
Usage
ONNX Runtime (browser)
PyTorch
Files
- INT8 quantized ONNX (1.2 MB, for browser)
- FP32 ONNX (4.2 MB, reference)
- PyTorch checkpoint with training metrics
Part of Vestir
This model powers the merge suggestion feature in Vestir, a virtual wardrobe app. It identifies visually similar garments in a user's closet that could potentially be merged or consolidated.
- Downloads last month
- 27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support