🍽️ Gemma-3-270M Restaurant Reservation NER Model
A specialized fine-tuned version of Google's Gemma-3-270M model designed for extracting restaurant reservation information from user messages with robust handling of ASR-generated text.
✨ Key Features
- 🎯 Entity Extraction: Identifies three key reservation elements
- 🌐 Bilingual Support: Handles both Chinese and English input
- 🎙️ ASR Robust: Optimized for noisy speech recognition output
- 📱 Phone Focus: Specialized for Taiwanese mobile number extraction
📋 Comprehensive Example: All Three Entities
Complex Input Text:"Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"
Extracted Output:
{
"num_people": "5",
"reservation_date": "15th of next month at 7:30 PM",
"phone_num": "0912345678"
}
Entity Breakdown from this Complex Example:
| Entity | Extracted From Input | Normalized Output |
|---|---|---|
num_people |
"2 adults and 3 children" |
"5" (summed total) |
reservation_date |
"15th of next month around 7:30 in the evening" |
"15th of next month at 7:30 PM" (normalized time format) |
phone_num |
"+886-912-345-678" |
"0912345678" (international format converted to local) |
⚠️ Important Note: Phone Number Handling
This model exclusively extracts Taiwanese 10-digit mobile numbers (09XXXXXXXX format):
✅ Extracted: Mobile numbers with complex variations
"+886-912-345-678"→0912345678(international format)"零九一二三四五六七八"→0912345678(Chinese characters)"09 12 34 56 78"→0912345678(spaced format)
❌ Ignored: Non-mobile numbers
"市話02-1234-5678"→""(landline)"國際電話+1-555-123-4567"→""(international non-Taiwanese)"免付費0800-123-456"→""(toll-free)
🚀 Quick Start
Installation
pip install transformers torch
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json
# Load model and tokenizer
model_name = "Luigi/dinercall-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
# System prompt (in original Chinese)
system_prompt = """你是一個助理,負責從用戶消息中提取預訂資訊並以JSON格式輸出。
JSON必須包含三個字段: num_people, reservation_date, phone_num。
如果某個字段沒有信息,使用空字符串。只輸出JSON,不要添加任何其他文字。"""
# Example with complex input
user_input = "Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input}
]
# Generate response
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.1,
do_sample=False
)
# Process output
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
result = json.loads(response)
print(result)
# Output: {"num_people": "5", "reservation_date": "15th of next month at 7:30 PM", "phone_num": "0912345678"}
🎯 Use Cases
Perfect For
- 🗣️ Voice assistant reservation systems
- 🤖 Chatbot booking interfaces
- 📞 Call center automation
- 📱 Mobile app reservation features
Ideal Input Types
- English: "Book for 6 people next Friday at 8 PM"
- Chinese: "預約明天晚上7點,四位成人"
- Mixed: "我想book 4位,tomorrow at 7 PM"
📊 Training Details
Dataset
- Source: dinercall-ner dataset
- Samples: 20,000 synthetic reservation requests
- Language: 70% Chinese, 30% English
- Features: ASR noise simulation, realistic error patterns
Configuration
| Parameter | Value |
|---|---|
| Base Model | unsloth/gemma-3-270m-it-unsloth-bnb-4bit |
| Max Sequence Length | 256 tokens |
| Learning Rate | 2e-5 |
| Batch Size | 4 (gradient accumulation: 2) |
| Training Epochs | 10 |
| LoRA Rank | 32 |
📝 Advanced Examples & Outputs
Complex Input Examples with Outputs
# Example 1: Complex English with mixed formatting
input_text = "Could you please reserve a table for 3 adults and 2 children on December 24th around 8 PM? My contact is +886-987-654-321"
output = {
"num_people": "5",
"reservation_date": "December 24th at 8 PM",
"phone_num": "0987654321"
}
# Example 2: Chinese with complex date and mixed digits
input_text = "我們想要預約下個月15號晚上7點半,4大2小,電話是零九八七-六五四三二一"
output = {
"num_people": "6",
"reservation_date": "下個月15號晚上7點半",
"phone_num": "0987654321"
}
# Example 3: Noisy ASR input with complex elements
input_text = "Book for for 2 adullts and 1 childreen onn nexts Friday at 6:45 PM, fone 09八七六五四三二一"
output = {
"num_people": "3",
"reservation_date": "next Friday at 6:45 PM",
"phone_num": "0987654321"
}
# Example 4: Mixed language with complex request
input_text = "我想book 3大人2小孩,time是next Wednesday at 7:30 PM,contact number是0912-345-678"
output = {
"num_people": "5",
"reservation_date": "next Wednesday at 7:30 PM",
"phone_num": "0912345678"
}
⚠️ Limitations & Considerations
Technical Limitations
- 🎯 Phone Numbers: Only Taiwanese mobile numbers (09XXXXXXXX)
- 🌍 Geography: Optimized for Taiwanese reservation patterns
- 🎙️ ASR Types: Best performance on simulated ASR errors similar to training data
- 💬 Language Mix: Handles Chinese/English mixing but may struggle with other languages
Ethical Considerations
- 🔒 Privacy: Only extracts mobile numbers; landline numbers are ignored
- 📋 Consent: Ensure proper user consent for data processing
- ⚖️ Compliance: Follow local regulations for data handling
📚 Citation
If you use this model in your research, please cite:
@software{dinercall_ner_model_2025,
author = {Luigi},
title = {Gemma-3-270M Fine-tuned for Restaurant Reservation NER},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Luigi/dinercall-ner}
}
🆘 Support
For questions, issues, or contributions:
- 📧 Open an issue on the Hugging Face repository
- 💬 Check the examples above for common usage patterns
- 🔧 Review the limitations section before deployment
📄 License
This model inherits the license terms of the base Gemma model. Please review Google's license terms for specific usage rights and restrictions.
- Downloads last month
- 126