🍽️ Gemma-3-270M Restaurant Reservation NER Model

Hugging Face License Model Size

A specialized fine-tuned version of Google's Gemma-3-270M model designed for extracting restaurant reservation information from user messages with robust handling of ASR-generated text.

✨ Key Features

  • 🎯 Entity Extraction: Identifies three key reservation elements
  • 🌐 Bilingual Support: Handles both Chinese and English input
  • 🎙️ ASR Robust: Optimized for noisy speech recognition output
  • 📱 Phone Focus: Specialized for Taiwanese mobile number extraction

📋 Comprehensive Example: All Three Entities

Complex Input Text:
"Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"

Extracted Output:

{
  "num_people": "5",
  "reservation_date": "15th of next month at 7:30 PM", 
  "phone_num": "0912345678"
}

Entity Breakdown from this Complex Example:

Entity Extracted From Input Normalized Output
num_people "2 adults and 3 children" "5" (summed total)
reservation_date "15th of next month around 7:30 in the evening" "15th of next month at 7:30 PM" (normalized time format)
phone_num "+886-912-345-678" "0912345678" (international format converted to local)

⚠️ Important Note: Phone Number Handling

This model exclusively extracts Taiwanese 10-digit mobile numbers (09XXXXXXXX format):

Extracted: Mobile numbers with complex variations

  • "+886-912-345-678"0912345678 (international format)
  • "零九一二三四五六七八"0912345678 (Chinese characters)
  • "09 12 34 56 78"0912345678 (spaced format)

Ignored: Non-mobile numbers

  • "市話02-1234-5678""" (landline)
  • "國際電話+1-555-123-4567""" (international non-Taiwanese)
  • "免付費0800-123-456""" (toll-free)

🚀 Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json

# Load model and tokenizer
model_name = "Luigi/dinercall-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

# System prompt (in original Chinese)
system_prompt = """你是一個助理,負責從用戶消息中提取預訂資訊並以JSON格式輸出。
JSON必須包含三個字段: num_people, reservation_date, phone_num。
如果某個字段沒有信息,使用空字符串。只輸出JSON,不要添加任何其他文字。"""

# Example with complex input
user_input = "Hi, I'd like to make a reservation for 2 adults and 3 children on the 15th of next month around 7:30 in the evening, and you can reach me at +886-912-345-678"
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_input}
]

# Generate response
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        temperature=0.1,
        do_sample=False
    )

# Process output
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
result = json.loads(response)
print(result)
# Output: {"num_people": "5", "reservation_date": "15th of next month at 7:30 PM", "phone_num": "0912345678"}

🎯 Use Cases

Perfect For

  • 🗣️ Voice assistant reservation systems
  • 🤖 Chatbot booking interfaces
  • 📞 Call center automation
  • 📱 Mobile app reservation features

Ideal Input Types

  • English: "Book for 6 people next Friday at 8 PM"
  • Chinese: "預約明天晚上7點,四位成人"
  • Mixed: "我想book 4位,tomorrow at 7 PM"

📊 Training Details

Dataset

  • Source: dinercall-ner dataset
  • Samples: 20,000 synthetic reservation requests
  • Language: 70% Chinese, 30% English
  • Features: ASR noise simulation, realistic error patterns

Configuration

Parameter Value
Base Model unsloth/gemma-3-270m-it-unsloth-bnb-4bit
Max Sequence Length 256 tokens
Learning Rate 2e-5
Batch Size 4 (gradient accumulation: 2)
Training Epochs 10
LoRA Rank 32

📝 Advanced Examples & Outputs

Complex Input Examples with Outputs

# Example 1: Complex English with mixed formatting
input_text = "Could you please reserve a table for 3 adults and 2 children on December 24th around 8 PM? My contact is +886-987-654-321"
output = {
  "num_people": "5",
  "reservation_date": "December 24th at 8 PM",
  "phone_num": "0987654321"
}

# Example 2: Chinese with complex date and mixed digits
input_text = "我們想要預約下個月15號晚上7點半,4大2小,電話是零九八七-六五四三二一"
output = {
  "num_people": "6",
  "reservation_date": "下個月15號晚上7點半",
  "phone_num": "0987654321"
}

# Example 3: Noisy ASR input with complex elements
input_text = "Book for for 2 adullts and 1 childreen onn nexts Friday at 6:45 PM, fone 09八七六五四三二一"
output = {
  "num_people": "3",
  "reservation_date": "next Friday at 6:45 PM",
  "phone_num": "0987654321"
}

# Example 4: Mixed language with complex request
input_text = "我想book 3大人2小孩,time是next Wednesday at 7:30 PM,contact number是0912-345-678"
output = {
  "num_people": "5",
  "reservation_date": "next Wednesday at 7:30 PM",
  "phone_num": "0912345678"
}

⚠️ Limitations & Considerations

Technical Limitations

  • 🎯 Phone Numbers: Only Taiwanese mobile numbers (09XXXXXXXX)
  • 🌍 Geography: Optimized for Taiwanese reservation patterns
  • 🎙️ ASR Types: Best performance on simulated ASR errors similar to training data
  • 💬 Language Mix: Handles Chinese/English mixing but may struggle with other languages

Ethical Considerations

  • 🔒 Privacy: Only extracts mobile numbers; landline numbers are ignored
  • 📋 Consent: Ensure proper user consent for data processing
  • ⚖️ Compliance: Follow local regulations for data handling

📚 Citation

If you use this model in your research, please cite:

@software{dinercall_ner_model_2025,
  author = {Luigi},
  title = {Gemma-3-270M Fine-tuned for Restaurant Reservation NER},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Luigi/dinercall-ner}
}

🆘 Support

For questions, issues, or contributions:

  • 📧 Open an issue on the Hugging Face repository
  • 💬 Check the examples above for common usage patterns
  • 🔧 Review the limitations section before deployment

📄 License

This model inherits the license terms of the base Gemma model. Please review Google's license terms for specific usage rights and restrictions.

Downloads last month
126
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Luigi/gemma-3-270m-it-dinercall-ner

Finetuned
(65)
this model

Dataset used to train Luigi/gemma-3-270m-it-dinercall-ner

Space using Luigi/gemma-3-270m-it-dinercall-ner 1

Collection including Luigi/gemma-3-270m-it-dinercall-ner