ernchern commited on
Commit
eb92401
·
verified ·
1 Parent(s): 06568e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -23,36 +23,36 @@ model-index:
23
 
24
  # Model Card: BLOOM-560m for Personal Sharing Classification
25
 
26
- [cite_start]This model is a fine-tuned version of [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m) designed to classify personal experience sharing in social media text[cite: 80, 85]. [cite_start]It was developed to explore how different generations (Baby Boomers and Gen X) express themselves on pseudonymous platforms like Reddit[cite: 56, 144].
27
 
28
  ## Model Details
29
 
30
- - [cite_start]**Model Type:** Large Language Model (Decoder-only) fine-tuned for sequence classification[cite: 80, 85].
31
- - [cite_start]**Language:** English[cite: 77].
32
- - [cite_start]**Finetuned from model:** `bigscience/bloom-560m`[cite: 85].
33
- - [cite_start]**Application:** Sociotechnical research on digital aging and online self-disclosure[cite: 17, 180].
34
 
35
  ## Intended Use
36
 
37
  ### Primary Task
38
- [cite_start]The model classifies individual sentences into one of four categories to analyze domains of self-disclosure in online forums[cite: 80].
39
 
40
  ### Categories
41
- * [cite_start]**Health and Wellness (Label 0):** Personal experiences regarding physical/mental health, treatments, or aging-related bodily changes[cite: 80, 81].
42
- * [cite_start]**Personal Relationships and Identity (Label 1):** Sentences describing social ties, family, friendships, or social identities[cite: 80, 81].
43
- * [cite_start]**Professional and Financial (Label 2):** Reflections on work, career history, retirement planning, and financial management[cite: 80, 81].
44
- * [cite_start]**Not Related to Personal Sharing (Label 3):** Non-reflective content, general information, or social pleasantries (excluded from analysis)[cite: 80, 84].
45
 
46
  ## Training Data
47
 
48
- * [cite_start]**Source:** Publicly available posts and comments from the Reddit subreddit `r/AskOldPeople`[cite: 65, 76].
49
- * [cite_start]**Size:** 2,000 manually labeled sentences (stratified sampling: 500 per category)[cite: 86].
50
- * [cite_start]**Data Split:** 80% Training, 10% Validation, 10% Test[cite: 86].
51
- * [cite_start]**Preprocessing:** Sentences were tokenized using the Punkt sentence tokenizer[cite: 77].
52
 
53
  ## Performance
54
 
55
- [cite_start]The model achieved high accuracy on a held-out test set[cite: 87]:
56
 
57
  | Metric | Value |
58
  | :--- | :--- |
 
23
 
24
  # Model Card: BLOOM-560m for Personal Sharing Classification
25
 
26
+ This model is a fine-tuned version of BLOOM-560m designed to classify personal experience sharing in social media text. It was developed to explore how different generations (Baby Boomers and Gen X) express themselves on pseudonymous platforms like Reddit.
27
 
28
  ## Model Details
29
 
30
+ - **Model Type:** Large Language Model (Decoder-only) fine-tuned for sequence classification.
31
+ - **Language:** English.
32
+ - **Finetuned from model:** `bigscience/bloom-560m`.
33
+ - **Application:** Sociotechnical research on digital aging and online self-disclosure.
34
 
35
  ## Intended Use
36
 
37
  ### Primary Task
38
+ The model classifies individual sentences into one of four categories to analyze domains of self-disclosure in online forums.
39
 
40
  ### Categories
41
+ * **Health and Wellness (Label 0):** Personal experiences regarding physical/mental health, treatments, or aging-related bodily changes.
42
+ * **Personal Relationships and Identity (Label 1):** Sentences describing social ties, family, friendships, or social identities.
43
+ * **Professional and Financial (Label 2):** Reflections on work, career history, retirement planning, and financial management.
44
+ * **Not Related to Personal Sharing (Label 3):** Non-reflective content, general information, or social pleasantries.
45
 
46
  ## Training Data
47
 
48
+ * **Source:** Publicly available posts and comments from the Reddit subreddit `r/AskOldPeople`.
49
+ * **Size:** 2,000 manually labeled sentences (stratified sampling: 500 per category).
50
+ * **Data Split:** 80% Training, 10% Validation, 10% Test.
51
+ * **Preprocessing:** Sentences were tokenized using the Punkt sentence tokenizer.
52
 
53
  ## Performance
54
 
55
+ The model achieved high accuracy on a held-out test set:
56
 
57
  | Metric | Value |
58
  | :--- | :--- |