---
license: mit
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
pipeline_tag: text-classification
tags:
- NLP
- sentiment
- logistciregression
---
# 🧠 Sentiment Analysis with Logistic Regression

This model performs **multi-class sentiment analysis** on tweets, classifying them into the following categories:
- Positive
- Negative
- Neutral
- Irrelevant

It uses a custom preprocessing pipeline with:
<!-- - Text cleaning (URL, mention, hashtag, punctuation removal)-->
- CountVectorizer
- TF-IDF transformation
- Logistic Regression classifier (`max_iter=1000`)

---

## 🏗 Model Architecture

<!-- - **TextCleaner**: Custom scikit-learn transformer for consistent text preprocessing.-->
- **CountVectorizer**: Converts tweets into token count vectors.
- **TfidfTransformer**: Reweights tokens by importance.
- **LogisticRegression**: Interpretable and robust classification baseline.

---

## 🧪 Evaluation

Evaluated on a separate validation set of 999 tweets:

| Class       | Precision | Recall | F1-score |
|-------------|-----------|--------|----------|
| Irrelevant  | 0.88      | 0.85   | 0.87     |
| Negative    | 0.87      | 0.94   | 0.91     |
| Neutral     | 0.97      | 0.86   | 0.91     |
| Positive    | 0.89      | 0.94   | 0.91     |
| **Overall Accuracy** |        |        | **0.90**     |

---

## 📦 Usage

```
python
import joblib

model = joblib.load("sentiment_model_lr.pkl")
user_input = "This update is surprisingly good!"

prediction = model.predict([user_input])
print(prediction[0])  # → Positive, Negative, etc.
```
---
```> ⚠️ Requires scikit-learn 1.6.1+ to avoid version mismatch warnings.```

---

## 📚 Dataset
```
Tweets were preprocessed using a clean_text routine and labeled into
the four sentiment categories. If you’d like to experiment or re-train, contact
the author or fork this repo.
```

---
## 🧑‍💻 Author
```
Built by @arshvir Model version: 1.0 License: MIT
```

---