πŸ“š IMDB Sentiment Classifier (Dual-Model)

This repository contains two deep learning models for sentiment classification of IMDB movie reviews, each trained with a different vocabulary size and number of parameters.


πŸ“‚ Dataset & Training Notes

  • These models were trained on a dataset of approximately 150,000 IMDB movie reviews, which were manually scraped from the web.
  • The reviews were pseudo-labeled using soft probability outputs from the cardiffnlp/twitter-roberta-base-sentiment model.
  • This method provided probabilistic sentiment labels (Negative / Neutral / Positive) for training, allowing the models to learn from soft targets rather than hard class labels.

πŸ“ Dataset

Citation (Please add if you use this dataset)

@misc{imdb-multimovie-reviews,
  title = {IMDb Multi-Movie Review Dataset},
  author = {Daksh Bhardwaj},
  year = {2025},
  url = {https://huggingface.co/datasets/Daksh0505/IMDB-Reviews
  note = {Accessed: 2025-07-17}
}

🧠 Models

πŸ”Ή Model A

  • Filename: sentiment_model_imdb_6.6M.keras
  • Trainable Parameters: ~6.6 million
  • Total Parameters: ~13.06 million
  • Vocabulary Size: 50,000 tokens
  • Description: Lightweight and efficient; optimized for speed.

πŸ”Ή Model B

  • Filename: sentiment_model_imdb_34M.keras
  • Trainable Parameters: ~34 million
  • Total Parameters: ~99.43 million
  • Vocabulary Size: 256,000 tokens
  • Description: Larger and more expressive; higher accuracy on nuanced reviews.

πŸ—‚ Tokenizers

Each model uses its own tokenizer in Keras JSON format:

  • tokenizer_50k.json β†’ used with Model A
  • tokenizer_256k.json β†’ used with Model B

πŸ”§ Load Models & Tokenizers (from Hugging Face Hub)

from huggingface_hub import hf_hub_download
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
import json

# === Model A ===
model_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_6.6M.keras")
tokenizer_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_50k.json")

with open(tokenizer_path_a, "r") as f:
    tokenizer_a = tokenizer_from_json(json.load(f))

model_a = load_model(model_path_a)

# === Model B ===
model_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_34M.keras")
tokenizer_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_256k.json")

with open(tokenizer_path_b, "r") as f:
    tokenizer_b = tokenizer_from_json(json.load(f))

model_b = load_model(model_path_b)

πŸš€ Try the Live Demo

Click below to test both models live in your browser:

Open in Spaces

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Space using Daksh0505/sentiment-model-imdb 1