Vestir Clothing Similarity Model v2

MobileNetV3-Small backbone producing 128-dim L2-normalized embeddings for clothing visual similarity.

Model Details

Property	Value
Architecture	MobileNetV3-Small + embedding head
Parameters	1.1M
Embedding dim	128
Input size	224x224 RGB
ONNX INT8 size	1.2 MB
ONNX FP32 size	4.2 MB

Training (v2)

Key improvements over v1:

Triplet loss (anchor/positive/negative) instead of contrastive loss
Hard negative mining - 80% of negatives are same-color different-subcategory pairs
Strong color jitter (0.6 brightness/contrast/saturation) forces shape learning over color
Subcategory-level labels instead of broad master categories

Dataset: ashraq/fashion-product-images-small with 5000 training and 1000 validation images.

Training config:

Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
Loss: TripletMarginLoss (margin=0.5)
Augmentation: RandomCrop, HorizontalFlip, ColorJitter(0.8), RandomGrayscale(0.1)
Hardware: NVIDIA GTX 1050 Ti (4GB VRAM)

Metrics

v2 (current)

Metric	Hard-negative eval	Standard eval
Accuracy	88.9%	92.4%
F1	0.897	0.948
Precision	0.837	0.945
Optimal threshold	0.78	0.65

v1 (previous)

Metric	Hard-negative eval	Standard eval
Accuracy	82.5%	95.1%
F1	0.838	0.967
diff_subcat_same_color acc	6.2%	N/A

v2 dramatically improves on the hardest cases (same color, different garment type) at the cost of some easy-case accuracy.

Usage

ONNX Runtime (browser)

PyTorch

Files

- INT8 quantized ONNX (1.2 MB, for browser)
- FP32 ONNX (4.2 MB, reference)
- PyTorch checkpoint with training metrics

Part of Vestir

This model powers the merge suggestion feature in Vestir, a virtual wardrobe app. It identifies visually similar garments in a user's closet that could potentially be merged or consolidated.

Downloads last month: 27

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

almogtavor
/

vestir-clothing-similarity