LayoutLMv3 Receipt Parser
A fine-tuned LayoutLMv3 model for extracting structured information from receipt images with 89.34% validation accuracy.
Model Details
- Model:
albertosei/layoutlmv3-receipt-parser - Architecture: LayoutLMv3-base
- Task: Token Classification (Named Entity Recognition)
- Languages: English
- Training Data: 1,426 receipt samples
- Validation: 100 samples
- License: Apache 2.0
Performance Metrics
| Metric | Value |
|---|---|
| Final Validation Accuracy | 89.34% |
| Training Loss (Epoch 1) | 0.6824 |
| Training Loss (Epoch 2) | 0.3278 |
| Validation Accuracy (Epoch 1) | 83.49% |
| Number of Entity Labels | 51 |
Entity Labels
The model recognizes 25 entity types in BIO format:
Vendor Information
vendor_name- Store/business namevendor_address- Physical addressvendor_phone_number- Contact number
Date & Time
date- Transaction datetime- Transaction time
Receipt Details
receipt_id- Receipt number/identifiercurrency- Currency type
Financial Amounts
total_amount- Final totalsubtotal_amount- Subtotal before taxtax_amount- Tax amountservice_charge_amount- Service feesdiscount_amount- Discounts appliedtip_amount- Tip/gratuity
Payment Information
cash_paid_amount- Cash paymentchange_amount- Change returnedcredit_card_amount- Credit card paymente_money_amount- Electronic paymentpayment_method- Payment type
Line Items
line_item_name- Product/service nameline_item_quantity- Quantity purchasedline_item_unit_price- Price per unitline_item_total_price- Line item totalline_item_discount_amount- Item-level discountline_item_vat_status- VAT information
Other
other- Miscellaneous information
Usage
from transformers import AutoProcessor, AutoModelForTokenClassification
from PIL import Image
import torch
# Load model and processor
processor = AutoProcessor.from_pretrained("albertosei/layoutlmv3-receipt-parser", apply_ocr=False)
model = AutoModelForTokenClassification.from_pretrained("albertosei/layoutlmv3-receipt-parser")
# Prepare inputs (requires external OCR for text and bounding boxes)
image = Image.open("receipt.jpg").convert("RGB")
words = ["STORE", "NAME", "Date:", "2024-01-01", "Total:", "25.99"] # From OCR
boxes = [[0, 0, 100, 20], [100, 0, 200, 20], [0, 20, 50, 40],
[50, 20, 150, 40], [0, 40, 50, 60], [50, 40, 150, 60]] # From OCR
# Process and predict
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
with torch.no_grad():
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1).squeeze().tolist()
# Convert to labels
predicted_labels = [model.config.id2label[pred] for pred in predictions]
- Downloads last month
- 79