Commit
·
d22fb13
1
Parent(s):
aa7cac0
Upload 19 files
Browse files
README.md
CHANGED
|
@@ -166,14 +166,14 @@ Additionally, 294 golden tweets (24.12%) related to the topic of #abortion were
|
|
| 166 |
Before fine-tuning, we built a copy of the dataset by creating an augmentation of each tweet. The augmentation consisted of replacing all the
|
| 167 |
topic words and entities in a tweet replaced, and then randomly masking 10% of the words in a tweet, which were then matched using
|
| 168 |
[BERTweet-base](https://huggingface.co/vinai/bertweet-base) as a `fill-mask` model. We chose to omit 10% of the words because this resulted in the
|
| 169 |
-
smallest possible average cosine distance between the tweets and their augmentations of 0.
|
| 170 |
-
|
| 171 |
During fine-tuning, we formed pairs by matching each tweet with all remaining tweets in the same data split (training, testing, holdout)
|
| 172 |
with similar or dissimilar class labels. For the training and testing set during the fine-tuning process, we utilized the augmentations, and for the
|
| 173 |
holdout tweets, we used their original text to test the fine-tuning process and the usefulness of the augmentations towards real tweets.
|
| 174 |
For all pairs, we chose the largest possible set so that both similar and dissimilar pairs are equally represented while covering all tweets
|
| 175 |
of the respective data split.
|
| 176 |
-
This process created
|
| 177 |
holdout data. Moreover, we utilized `MEAN` pooling, enhancing sentence representations, for fine-tuning.
|
| 178 |
|
| 179 |
The model was trained with the parameters:
|
|
@@ -215,23 +215,20 @@ Parameters of the fit()-Method:
|
|
| 215 |
|
| 216 |
## Evaluation Results
|
| 217 |
|
| 218 |
-
|
| 219 |
-
|
| 220 |
|
| 221 |
|
| 222 |
| Model | Precision | Recall | F1 | Support |
|
| 223 |
|-----------------------------------------|-----------|---------|--------|---------|
|
| 224 |
-
| Vanilla BERTweet-`CLS` | 50.00% | 100.00% | 66.67% |
|
| 225 |
-
| Augmented BERTweet-`CLS` |
|
| 226 |
-
| WRAPresentations-`CLS` | 66.00% | 84.32% | 74.04% |
|
| 227 |
-
| WRAPresentations-`MEAN` (current model) | 63.05% | 88.91% | 73.78% |
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
Reason, Statement, Notification, and None. As reference, we report the results for Vanilla BERTweet-`CLS`, which a plain BERTweet-base model, for
|
| 233 |
-
Augmented BERTweet-`CLS`, which was trained on the same augmentations as WRAPresentations-`MEAN` but directly optimizing on the classification task, and
|
| 234 |
-
WRAPresentations-`MEAN`, which is the same model as the presented model but with `CLS` pooling during fine-tuning.
|
| 235 |
|
| 236 |
## Full Model Architecture
|
| 237 |
<div align="center">
|
|
|
|
| 166 |
Before fine-tuning, we built a copy of the dataset by creating an augmentation of each tweet. The augmentation consisted of replacing all the
|
| 167 |
topic words and entities in a tweet replaced, and then randomly masking 10% of the words in a tweet, which were then matched using
|
| 168 |
[BERTweet-base](https://huggingface.co/vinai/bertweet-base) as a `fill-mask` model. We chose to omit 10% of the words because this resulted in the
|
| 169 |
+
smallest possible average cosine distance between the tweets and their augmentations of ~0.08 making augmentation during pre-classification
|
| 170 |
+
fine-tuning itself a regulating factor prior to any overfitting with the later test data.
|
| 171 |
During fine-tuning, we formed pairs by matching each tweet with all remaining tweets in the same data split (training, testing, holdout)
|
| 172 |
with similar or dissimilar class labels. For the training and testing set during the fine-tuning process, we utilized the augmentations, and for the
|
| 173 |
holdout tweets, we used their original text to test the fine-tuning process and the usefulness of the augmentations towards real tweets.
|
| 174 |
For all pairs, we chose the largest possible set so that both similar and dissimilar pairs are equally represented while covering all tweets
|
| 175 |
of the respective data split.
|
| 176 |
+
This process created 162,064 pairs for training and 71,812 pairs for testing. An additional 53,560 pairs were used for final evaluation with the
|
| 177 |
holdout data. Moreover, we utilized `MEAN` pooling, enhancing sentence representations, for fine-tuning.
|
| 178 |
|
| 179 |
The model was trained with the parameters:
|
|
|
|
| 215 |
|
| 216 |
## Evaluation Results
|
| 217 |
|
| 218 |
+
We optimized several BERTweet models with `CLS` or `MEAN` pooling and evaluated them using the `BinaryClassificationEvaluator` of SBERT with
|
| 219 |
+
standard `CLS` tokens for classification showing:
|
| 220 |
|
| 221 |
|
| 222 |
| Model | Precision | Recall | F1 | Support |
|
| 223 |
|-----------------------------------------|-----------|---------|--------|---------|
|
| 224 |
+
| Vanilla BERTweet-`CLS` | 50.00% | 100.00% | 66.67% | 53,560 |
|
| 225 |
+
| Augmented BERTweet-`CLS` | 65.69% | 86.66% | 74.73% | 53,560 |
|
| 226 |
+
| WRAPresentations-`CLS` | 66.00% | 84.32% | 74.04% | 53,560 |
|
| 227 |
+
| WRAPresentations-`MEAN` (current model) | 63.05% | 88.91% | 73.78% | 53,560 |
|
| 228 |
+
|
| 229 |
+
The outcomes for WRAPresentations-`MEAN` are influenced by the utilization of `CLS` pooling during testing, while `MEAN` pooling was employed during
|
| 230 |
+
fine-tuning. Despite this, employing `MEAN` pooling during the fine-tuning process still improved the `CLS` representation, particularly in terms
|
| 231 |
+
of recall. When WRAPresentations-`MEAN` is tested with `MEAN` pooling, the resulting F1 score stands at 74.07%.
|
|
|
|
|
|
|
|
|
|
| 232 |
|
| 233 |
## Full Model Architecture
|
| 234 |
<div align="center">
|