Vestir Clothing Similarity Model v2

MobileNetV3-Small backbone producing 128-dim L2-normalized embeddings for clothing visual similarity.

Model Details

Property Value
Architecture MobileNetV3-Small + embedding head
Parameters 1.1M
Embedding dim 128
Input size 224x224 RGB
ONNX INT8 size 1.2 MB
ONNX FP32 size 4.2 MB

Training (v2)

Key improvements over v1:

  • Triplet loss (anchor/positive/negative) instead of contrastive loss
  • Hard negative mining - 80% of negatives are same-color different-subcategory pairs
  • Strong color jitter (0.6 brightness/contrast/saturation) forces shape learning over color
  • Subcategory-level labels instead of broad master categories

Dataset: ashraq/fashion-product-images-small with 5000 training and 1000 validation images.

Training config:

  • Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
  • Loss: TripletMarginLoss (margin=0.5)
  • Augmentation: RandomCrop, HorizontalFlip, ColorJitter(0.8), RandomGrayscale(0.1)
  • Hardware: NVIDIA GTX 1050 Ti (4GB VRAM)

Metrics

v2 (current)

Metric Hard-negative eval Standard eval
Accuracy 88.9% 92.4%
F1 0.897 0.948
Precision 0.837 0.945
Optimal threshold 0.78 0.65

v1 (previous)

Metric Hard-negative eval Standard eval
Accuracy 82.5% 95.1%
F1 0.838 0.967
diff_subcat_same_color acc 6.2% N/A

v2 dramatically improves on the hardest cases (same color, different garment type) at the cost of some easy-case accuracy.

Usage

ONNX Runtime (browser)

PyTorch

Files

    • INT8 quantized ONNX (1.2 MB, for browser)
    • FP32 ONNX (4.2 MB, reference)
    • PyTorch checkpoint with training metrics

Part of Vestir

This model powers the merge suggestion feature in Vestir, a virtual wardrobe app. It identifies visually similar garments in a user's closet that could potentially be merged or consolidated.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train almogtavor/vestir-clothing-similarity