Update README.md
Browse files
README.md
CHANGED
|
@@ -23,36 +23,36 @@ model-index:
|
|
| 23 |
|
| 24 |
# Model Card: BLOOM-560m for Personal Sharing Classification
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
## Model Details
|
| 29 |
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
|
| 35 |
## Intended Use
|
| 36 |
|
| 37 |
### Primary Task
|
| 38 |
-
|
| 39 |
|
| 40 |
### Categories
|
| 41 |
-
*
|
| 42 |
-
*
|
| 43 |
-
*
|
| 44 |
-
*
|
| 45 |
|
| 46 |
## Training Data
|
| 47 |
|
| 48 |
-
*
|
| 49 |
-
*
|
| 50 |
-
*
|
| 51 |
-
*
|
| 52 |
|
| 53 |
## Performance
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
| Metric | Value |
|
| 58 |
| :--- | :--- |
|
|
|
|
| 23 |
|
| 24 |
# Model Card: BLOOM-560m for Personal Sharing Classification
|
| 25 |
|
| 26 |
+
This model is a fine-tuned version of BLOOM-560m designed to classify personal experience sharing in social media text. It was developed to explore how different generations (Baby Boomers and Gen X) express themselves on pseudonymous platforms like Reddit.
|
| 27 |
|
| 28 |
## Model Details
|
| 29 |
|
| 30 |
+
- **Model Type:** Large Language Model (Decoder-only) fine-tuned for sequence classification.
|
| 31 |
+
- **Language:** English.
|
| 32 |
+
- **Finetuned from model:** `bigscience/bloom-560m`.
|
| 33 |
+
- **Application:** Sociotechnical research on digital aging and online self-disclosure.
|
| 34 |
|
| 35 |
## Intended Use
|
| 36 |
|
| 37 |
### Primary Task
|
| 38 |
+
The model classifies individual sentences into one of four categories to analyze domains of self-disclosure in online forums.
|
| 39 |
|
| 40 |
### Categories
|
| 41 |
+
* **Health and Wellness (Label 0):** Personal experiences regarding physical/mental health, treatments, or aging-related bodily changes.
|
| 42 |
+
* **Personal Relationships and Identity (Label 1):** Sentences describing social ties, family, friendships, or social identities.
|
| 43 |
+
* **Professional and Financial (Label 2):** Reflections on work, career history, retirement planning, and financial management.
|
| 44 |
+
* **Not Related to Personal Sharing (Label 3):** Non-reflective content, general information, or social pleasantries.
|
| 45 |
|
| 46 |
## Training Data
|
| 47 |
|
| 48 |
+
* **Source:** Publicly available posts and comments from the Reddit subreddit `r/AskOldPeople`.
|
| 49 |
+
* **Size:** 2,000 manually labeled sentences (stratified sampling: 500 per category).
|
| 50 |
+
* **Data Split:** 80% Training, 10% Validation, 10% Test.
|
| 51 |
+
* **Preprocessing:** Sentences were tokenized using the Punkt sentence tokenizer.
|
| 52 |
|
| 53 |
## Performance
|
| 54 |
|
| 55 |
+
The model achieved high accuracy on a held-out test set:
|
| 56 |
|
| 57 |
| Metric | Value |
|
| 58 |
| :--- | :--- |
|