LayoutLMv3 Receipt Parser

A fine-tuned LayoutLMv3 model for extracting structured information from receipt images with 89.34% validation accuracy.

Model Details

Model: albertosei/layoutlmv3-receipt-parser
Architecture: LayoutLMv3-base
Task: Token Classification (Named Entity Recognition)
Languages: English
Training Data: 1,426 receipt samples
Validation: 100 samples
License: Apache 2.0

Performance Metrics

Metric	Value
Final Validation Accuracy	89.34%
Training Loss (Epoch 1)	0.6824
Training Loss (Epoch 2)	0.3278
Validation Accuracy (Epoch 1)	83.49%
Number of Entity Labels	51

Entity Labels

The model recognizes 25 entity types in BIO format:

Vendor Information

vendor_name - Store/business name
vendor_address - Physical address
vendor_phone_number - Contact number

Date & Time

date - Transaction date
time - Transaction time

Receipt Details

receipt_id - Receipt number/identifier
currency - Currency type

Financial Amounts

total_amount - Final total
subtotal_amount - Subtotal before tax
tax_amount - Tax amount
service_charge_amount - Service fees
discount_amount - Discounts applied
tip_amount - Tip/gratuity

Payment Information

cash_paid_amount - Cash payment
change_amount - Change returned
credit_card_amount - Credit card payment
e_money_amount - Electronic payment
payment_method - Payment type

Line Items

line_item_name - Product/service name
line_item_quantity - Quantity purchased
line_item_unit_price - Price per unit
line_item_total_price - Line item total
line_item_discount_amount - Item-level discount
line_item_vat_status - VAT information

Other

other - Miscellaneous information

Usage

from transformers import AutoProcessor, AutoModelForTokenClassification
from PIL import Image
import torch

# Load model and processor
processor = AutoProcessor.from_pretrained("albertosei/layoutlmv3-receipt-parser", apply_ocr=False)
model = AutoModelForTokenClassification.from_pretrained("albertosei/layoutlmv3-receipt-parser")

# Prepare inputs (requires external OCR for text and bounding boxes)
image = Image.open("receipt.jpg").convert("RGB")
words = ["STORE", "NAME", "Date:", "2024-01-01", "Total:", "25.99"]  # From OCR
boxes = [[0, 0, 100, 20], [100, 0, 200, 20], [0, 20, 50, 40], 
         [50, 20, 150, 40], [0, 40, 50, 60], [50, 40, 150, 60]]  # From OCR

# Process and predict
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
with torch.no_grad():
    outputs = model(**encoding)
    predictions = outputs.logits.argmax(-1).squeeze().tolist()

# Convert to labels
predicted_labels = [model.config.id2label[pred] for pred in predictions]

Downloads last month: 79

Safetensors

Model size

0.1B params

Tensor type

F32