ViBERTa / README.md
iSathyam03's picture
Update README.md
0d2513c verified

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade
metadata
title: ViBERTa
emoji: πŸ’¬
colorFrom: red
colorTo: gray
sdk: streamlit
app_file: app.py
pinned: false
sdk_version: 1.44.1

ViBERTa: Unwrapping Customer Sentiments with Sentiment Analysis! πŸ”

🌟 Overview

ViBERTa (VIBE + DeBERTa) is a sentiment analysis model fine-tuned on the McDonald's review dataset. Leveraging the power of Microsoft's DeBERTa, this model provides precise sentiment classification for customer reviews.

πŸ“‹ Model Specifications

Key Details

  • Model Name: ViBERTa
  • Base Model: microsoft/deberta-v3-base
  • Primary Task: Sentiment Classification
  • Sentiment Classes:
    • 0: Negative
    • 1: Neutral
    • 2: Positive

πŸ”¬ Technical Highlights

  • Advanced transformer-based architecture
  • Fine-tuned on domain-specific McDonald's review data
  • High accuracy in sentiment prediction

πŸ—‚ Dataset Insights

McDonald's Review Dataset

  • Source: Kaggle
  • Comprehensive collection of customer reviews
  • Manually labeled sentiment categories
  • Diverse range of customer experiences

πŸ›  Training Methodology

Configuration Parameters

Parameter Value
Batch Size 16
Total Epochs 3
Learning Rate 2e-5
Optimizer AdamW
Learning Rate Scheduler Cosine decay with warmup
Warmup Ratio 10%
Weight Decay 0.01
Mixed Precision Enabled (fp16)
Gradient Accumulation Steps 2

Training Approach

  • Tokenization using DeBERTa tokenizer
  • Cross-entropy loss function
  • Adaptive learning rate scheduling
  • Gradient accumulation for stable training

πŸš€ Quick Start Guide

Installation

Install the required dependencies:

# Create a virtual environment (recommended)
python -m venv viberta_env
source viberta_env/bin/activate  # On Windows, use `viberta_env\Scripts\activate`

# Install dependencies
pip install torch transformers datasets

Model Inference

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "iSathyam03/McD_Reviews_Sentiment_Analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    """Predict sentiment for given text."""
    inputs = tokenizer(
        text, 
        return_tensors="pt", 
        truncation=True, 
        padding=True
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    logits = outputs.logits
    prediction = torch.argmax(logits, dim=1).item()
    
    sentiment_labels = {
        0: "Negative", 
        1: "Neutral", 
        2: "Positive"
    }
    
    return sentiment_labels[prediction]

# Example usage
review = "The fries were amazing but the burger was stale."
sentiment = predict_sentiment(review)
print(f"Sentiment: {sentiment}")

πŸ“Š Performance Metrics

Evaluation Results

  • Accuracy: 0.856
  • F1-Score: 0.853

Confusion Matrix

[Include a visual or textual representation of the confusion matrix]

🌐 Deployment Options

  1. Hugging Face Inference API

    • Easy integration
    • Scalable cloud deployment
  2. Web Application Frameworks

    • Streamlit for interactive demos
    • Gradio for quick UI prototypes
    • Flask/FastAPI for robust REST APIs

πŸ” Limitations & Considerations

  • Performance may vary with out-of-domain text
  • Potential bias inherited from training data
  • Recommended to validate on your specific use case

πŸ“š References & Citations

Primary Citation

@article{he2020deberta,
  title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention},
  author={He, Pengcheng and Liu, Xiaodong and Gao, Jianfeng and Chen, Weizhu},
  journal={arXiv preprint arXiv:2006.03654},
  year={2020}
}

πŸ’‘ Pro Tip: Always validate model performance on your specific dataset!

⭐ Found ViBERTa helpful? Don't forget to star the repository! 🌟