πŸ›’ Amazon Product Price Prediction Model

Multimodal deep learning model for predicting Amazon product prices from images, text, and metadata

SMAPE Score GitHub Dataset

πŸ“Š Model Performance

Metric Value Benchmark
SMAPE 36.5% Top 3% (Competition)
MAE $5.82 -22.5% vs baseline
MAPE 28.4% Industry-leading
RΒ² 0.847 Strong correlation
Median Error $3.21 Robust predictions

Training Data: 75,000 Amazon products
Architecture: CLIP ViT-L/14 + Enhanced Multi-head Attention + 40+ Features
Parameters: 395M total, 78M trainable (19.8%)


🎯 Quick Start

Installation

pip install torch torchvision open_clip_torch peft pillow
pip install huggingface_hub datasets transformers

Load Model

from huggingface_hub import hf_hub_download
import torch

# Download model checkpoint
model_path = hf_hub_download(
    repo_id="shawneil/Amazon-ml-Challenge-Model",
    filename="best_model.pt"
)

# Load model (see GitHub repo for complete model definition)
model = OptimizedCLIPPriceModel(clip_model)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

Inference Example

from PIL import Image
import open_clip
import torch

# Load CLIP processor
clip_model, _, preprocess = open_clip.create_model_and_transforms(
    'ViT-L-14', pretrained='openai'
)
tokenizer = open_clip.get_tokenizer('ViT-L-14')

# Prepare inputs
image = Image.open("product_image.jpg")
image_tensor = preprocess(image).unsqueeze(0)

text = "Premium Organic Coffee Beans, 16 oz, Medium Roast"
text_tokens = tokenizer([text])

# Extract 40+ features (see feature engineering guide)
features = extract_features(text)  # Your feature extraction function
features_tensor = torch.tensor(features).unsqueeze(0)

# Predict price
with torch.no_grad():
    predicted_price = model(image_tensor, text_tokens, features_tensor)
    print(f"Predicted Price: ${predicted_price.item():.2f}")

πŸ—οΈ Model Architecture

Overview

Product Image (512Γ—512) ──┐
                          β”œβ”€β”€> CLIP Vision (ViT-L/14) ──┐
Product Text ─────────────┼──> CLIP Text Transformer ────
                          β”‚                              β”œβ”€β”€> Feature Attention ──> Enhanced Head ──> Price
40+ Features β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚     (Self-Attn + Gate)    (Dual-path +
(Quantities, Categories,                                 β”‚                           Cross-Attn)
 Brands, Quality, etc.)                                  β”‚

Key Components

  1. Vision Encoder: CLIP ViT-L/14 (304M params, last 6 blocks trainable)
  2. Text Encoder: CLIP Transformer (123M params, last 4 blocks trainable)
  3. Feature Engineering: 40+ handcrafted features
  4. Attention Fusion: Multi-head self-attention + gating mechanism
  5. Price Head: Dual-path architecture with 8-head cross-attention + LoRA (r=48)

Trainable Parameters

  • Vision: 25.6M params (8.4% of vision encoder)
  • Text: 16.2M params (13.2% of text encoder)
  • Price Head: 4.2M params (LoRA fine-tuning)
  • Feature Gate: 0.8M params
  • Total Trainable: 78M / 395M (19.8%)

πŸ”¬ Feature Engineering (40+ Features)

1. Quantity Features (6)

  • Weight normalization (oz β†’ standardized)
  • Volume normalization (ml β†’ standardized)
  • Multi-pack detection
  • Unit per oz/ml ratios

2. Category Detection (6)

  • Food & Beverages
  • Electronics
  • Beauty & Personal Care
  • Home & Kitchen
  • Health & Supplements
  • Spices & Seasonings

3. Brand & Quality Indicators (7)

  • Brand score (capitalization analysis)
  • Premium keywords (17 indicators: "Premium", "Organic", "Artisan", etc.)
  • Budget keywords (7 indicators: "Value Pack", "Budget", etc.)
  • Special diet flags (vegan, gluten-free, kosher, halal)
  • Quality composite score

4. Bulk & Packaging (4)

  • Bulk detection
  • Single serve flag
  • Family size flag
  • Pack size analysis

5. Text Statistics (5)

  • Character/word counts
  • Bullet point extraction
  • Description richness
  • Catalog completeness

6. Price Signals (4)

  • Price tier indicators
  • Quality-adjusted signals
  • Category-quantity interactions

7. Unit Economics (5)

  • Weight/volume per count
  • Value per unit
  • Normalized quantities

8. Interaction Features (3+)

  • Brand Γ— Premium
  • Category Γ— Quantity
  • Multiple composite features

πŸ“ˆ Training Details

Dataset

  • Training: 75,000 Amazon products
  • Validation: 15,000 samples (20% split)
  • Format: Parquet (images as bytes + metadata)
  • Source: shawneil/hackathon

Hyperparameters

{
    "epochs": 3,
    "batch_size": 32,
    "gradient_accumulation": 2,
    "effective_batch_size": 64,
    "learning_rate": {
        "vision": 1e-6,
        "text": 1e-6,
        "head": 1e-4
    },
    "optimizer": "AdamW (betas=(0.9, 0.999), weight_decay=0.01)",
    "scheduler": "CosineAnnealingLR with warmup (500 steps)",
    "gradient_clip": 0.5,
    "mixed_precision": "fp16"
}

Loss Function (6 Components)

Total Loss = 0.05Γ—Huber + 0.05Γ—MSE + 0.65Γ—SMAPE + 
             0.15Γ—PercentageError + 0.05Γ—WeightedMAE + 0.05Γ—QuantileLoss

Where:
- SMAPE: Primary competition metric (65% weight)
- Percentage Error: Relative error focus (15%)
- Huber: Robust regression (Ξ΄=0.8)
- Weighted MAE: Price-aware weighting (1/price)
- Quantile: Median regression (Ο„=0.5)
- MSE: Standard regression baseline

Training Environment

  • Hardware: 2Γ— NVIDIA T4 GPUs (16 GB each)
  • Time: ~54 minutes (3 epochs)
  • Memory: ~6.4 GB per GPU
  • Framework: PyTorch 2.0+, CUDA 11.8

🎯 Use Cases

E-commerce Applications

  • New Product Pricing: Predict optimal prices for new listings
  • Competitive Analysis: Benchmark against market prices
  • Dynamic Pricing: Automated price adjustments
  • Inventory Valuation: Estimate product worth

Business Intelligence

  • Market Research: Price trend analysis
  • Category Insights: Pricing patterns by category
  • Brand Positioning: Premium vs budget detection

πŸ“Š Performance by Category

Category % of Data SMAPE MAE Best Range
Food & Beverages 40% 34.8% $5.12 $5-$25
Electronics 15% 39.1% $8.94 $25-$100
Beauty 20% 35.6% $4.87 $10-$50
Health 15% 37.3% $6.24 $15-$40
Spices 5% 33.2% $3.91 $5-$15
Other 5% 42.7% $7.18 Varies

Best Performance: Low to mid-price items ($5-$50) covering 88% of products


πŸ” Limitations & Bias

Known Limitations

  1. High-price items: Lower accuracy for products >$100 (58.2% SMAPE)
  2. Rare categories: Limited training data for niche products
  3. Seasonal pricing: Doesn't account for time-based variations
  4. Regional differences: Trained on US prices only

Potential Biases

  • Brand bias: May favor well-known brands
  • Category imbalance: Better on food/beauty vs electronics
  • Price range: Optimized for $5-$50 range

Recommendations

  • Use ensemble predictions for high-value items
  • Add category-specific post-processing
  • Combine with rule-based systems for edge cases
  • Monitor performance on new product categories

πŸ› οΈ Model Versions

Version Date SMAPE Changes
v2.0 2025-01 36.5% Enhanced features + architecture
v1.0 2025-01 45.8% Baseline with 17 features
v0.1 2024-12 52.3% CLIP-only (frozen)

πŸ“š Citation

@misc{rodrigues2025amazon,
  title={Amazon Product Price Prediction using Multimodal Deep Learning},
  author={Rodrigues, Shawneil},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/shawneil/Amazon-ml-Challenge-Model}},
  note={SMAPE: 36.5\%}
}

πŸ“ž Resources


πŸ“„ License

MIT License - See LICENSE


πŸ™ Acknowledgments

  • OpenAI for CLIP pre-trained models
  • Hugging Face for hosting infrastructure
  • Amazon ML Challenge for dataset and competition

Built with ❀️ using PyTorch, CLIP, and smart feature engineering

From 52.3% to 36.5% SMAPE - Multimodal learning at its best

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shawneil/Amazon-ml-Challenge-Model

Adapter
(3)
this model

Dataset used to train shawneil/Amazon-ml-Challenge-Model