LayoutLMv3 Receipt Parser

A fine-tuned LayoutLMv3 model for extracting structured information from receipt images with 89.34% validation accuracy.

Model Details

  • Model: albertosei/layoutlmv3-receipt-parser
  • Architecture: LayoutLMv3-base
  • Task: Token Classification (Named Entity Recognition)
  • Languages: English
  • Training Data: 1,426 receipt samples
  • Validation: 100 samples
  • License: Apache 2.0

Performance Metrics

Metric Value
Final Validation Accuracy 89.34%
Training Loss (Epoch 1) 0.6824
Training Loss (Epoch 2) 0.3278
Validation Accuracy (Epoch 1) 83.49%
Number of Entity Labels 51

Entity Labels

The model recognizes 25 entity types in BIO format:

Vendor Information

  • vendor_name - Store/business name
  • vendor_address - Physical address
  • vendor_phone_number - Contact number

Date & Time

  • date - Transaction date
  • time - Transaction time

Receipt Details

  • receipt_id - Receipt number/identifier
  • currency - Currency type

Financial Amounts

  • total_amount - Final total
  • subtotal_amount - Subtotal before tax
  • tax_amount - Tax amount
  • service_charge_amount - Service fees
  • discount_amount - Discounts applied
  • tip_amount - Tip/gratuity

Payment Information

  • cash_paid_amount - Cash payment
  • change_amount - Change returned
  • credit_card_amount - Credit card payment
  • e_money_amount - Electronic payment
  • payment_method - Payment type

Line Items

  • line_item_name - Product/service name
  • line_item_quantity - Quantity purchased
  • line_item_unit_price - Price per unit
  • line_item_total_price - Line item total
  • line_item_discount_amount - Item-level discount
  • line_item_vat_status - VAT information

Other

  • other - Miscellaneous information

Usage

from transformers import AutoProcessor, AutoModelForTokenClassification
from PIL import Image
import torch

# Load model and processor
processor = AutoProcessor.from_pretrained("albertosei/layoutlmv3-receipt-parser", apply_ocr=False)
model = AutoModelForTokenClassification.from_pretrained("albertosei/layoutlmv3-receipt-parser")

# Prepare inputs (requires external OCR for text and bounding boxes)
image = Image.open("receipt.jpg").convert("RGB")
words = ["STORE", "NAME", "Date:", "2024-01-01", "Total:", "25.99"]  # From OCR
boxes = [[0, 0, 100, 20], [100, 0, 200, 20], [0, 20, 50, 40], 
         [50, 20, 150, 40], [0, 40, 50, 60], [50, 40, 150, 60]]  # From OCR

# Process and predict
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
with torch.no_grad():
    outputs = model(**encoding)
    predictions = outputs.logits.argmax(-1).squeeze().tolist()

# Convert to labels
predicted_labels = [model.config.id2label[pred] for pred in predictions]
Downloads last month
79
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support