Uploaded model
Browse files- dev.tsv +0 -0
- final-model.pt +3 -0
- loss.tsv +11 -0
- test.tsv +0 -0
- training.log +499 -0
- weights.txt +0 -0
dev.tsv
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
final-model.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bc3208dc8d7d34302e550643da037c4e08e941bd59cfe33ec4d4792c5d0bcb61
|
| 3 |
+
size 442654125
|
loss.tsv
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
|
| 2 |
+
1 13:32:00 4 0.0001 0.3934547606673132 0.038586683571338654 0.759 0.8903 0.8195 0.7139
|
| 3 |
+
2 14:47:41 4 0.0000 0.1352322201632861 0.015217592008411884 0.9081 0.9248 0.9164 0.8626
|
| 4 |
+
3 16:03:24 4 0.0000 0.10858782178342327 0.015040190890431404 0.9266 0.9286 0.9276 0.879
|
| 5 |
+
4 17:18:55 4 0.0000 0.0878958630160346 0.015710221603512764 0.9289 0.9327 0.9308 0.8838
|
| 6 |
+
5 18:33:23 4 0.0000 0.07165857778550887 0.017801353707909584 0.9277 0.9361 0.9319 0.8864
|
| 7 |
+
6 19:48:52 4 0.0000 0.05868402400697055 0.018429730087518692 0.9306 0.9438 0.9371 0.8922
|
| 8 |
+
7 21:04:47 4 0.0000 0.049209113448846445 0.02109825611114502 0.9344 0.938 0.9362 0.8926
|
| 9 |
+
8 22:20:46 4 0.0000 0.042763134030078184 0.02112417109310627 0.9347 0.9446 0.9396 0.8985
|
| 10 |
+
9 23:35:14 4 0.0000 0.03838577379283954 0.02171432413160801 0.9391 0.9446 0.9419 0.9008
|
| 11 |
+
10 00:49:11 4 0.0000 0.0361115163669216 0.023424603044986725 0.9389 0.9444 0.9417 0.9019
|
test.tsv
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
training.log
ADDED
|
@@ -0,0 +1,499 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
2022-02-04 12:18:14,159 ----------------------------------------------------------------------------------------------------
|
| 2 |
+
2022-02-04 12:18:14,161 Model: "SequenceTagger(
|
| 3 |
+
(embeddings): TransformerWordEmbeddings(
|
| 4 |
+
(model): CamembertModel(
|
| 5 |
+
(embeddings): RobertaEmbeddings(
|
| 6 |
+
(word_embeddings): Embedding(32005, 768, padding_idx=1)
|
| 7 |
+
(position_embeddings): Embedding(514, 768, padding_idx=1)
|
| 8 |
+
(token_type_embeddings): Embedding(1, 768)
|
| 9 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 10 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 11 |
+
)
|
| 12 |
+
(encoder): RobertaEncoder(
|
| 13 |
+
(layer): ModuleList(
|
| 14 |
+
(0): RobertaLayer(
|
| 15 |
+
(attention): RobertaAttention(
|
| 16 |
+
(self): RobertaSelfAttention(
|
| 17 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 18 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 19 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 21 |
+
)
|
| 22 |
+
(output): RobertaSelfOutput(
|
| 23 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 24 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 25 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 26 |
+
)
|
| 27 |
+
)
|
| 28 |
+
(intermediate): RobertaIntermediate(
|
| 29 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 30 |
+
)
|
| 31 |
+
(output): RobertaOutput(
|
| 32 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 33 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 34 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 35 |
+
)
|
| 36 |
+
)
|
| 37 |
+
(1): RobertaLayer(
|
| 38 |
+
(attention): RobertaAttention(
|
| 39 |
+
(self): RobertaSelfAttention(
|
| 40 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 41 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 42 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 43 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 44 |
+
)
|
| 45 |
+
(output): RobertaSelfOutput(
|
| 46 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 47 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 48 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 49 |
+
)
|
| 50 |
+
)
|
| 51 |
+
(intermediate): RobertaIntermediate(
|
| 52 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 53 |
+
)
|
| 54 |
+
(output): RobertaOutput(
|
| 55 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 56 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 57 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 58 |
+
)
|
| 59 |
+
)
|
| 60 |
+
(2): RobertaLayer(
|
| 61 |
+
(attention): RobertaAttention(
|
| 62 |
+
(self): RobertaSelfAttention(
|
| 63 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 64 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 65 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 66 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 67 |
+
)
|
| 68 |
+
(output): RobertaSelfOutput(
|
| 69 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 70 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 71 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 72 |
+
)
|
| 73 |
+
)
|
| 74 |
+
(intermediate): RobertaIntermediate(
|
| 75 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 76 |
+
)
|
| 77 |
+
(output): RobertaOutput(
|
| 78 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 79 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 80 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 81 |
+
)
|
| 82 |
+
)
|
| 83 |
+
(3): RobertaLayer(
|
| 84 |
+
(attention): RobertaAttention(
|
| 85 |
+
(self): RobertaSelfAttention(
|
| 86 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 87 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 88 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 89 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 90 |
+
)
|
| 91 |
+
(output): RobertaSelfOutput(
|
| 92 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 93 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 94 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 95 |
+
)
|
| 96 |
+
)
|
| 97 |
+
(intermediate): RobertaIntermediate(
|
| 98 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 99 |
+
)
|
| 100 |
+
(output): RobertaOutput(
|
| 101 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 102 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 103 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 104 |
+
)
|
| 105 |
+
)
|
| 106 |
+
(4): RobertaLayer(
|
| 107 |
+
(attention): RobertaAttention(
|
| 108 |
+
(self): RobertaSelfAttention(
|
| 109 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 110 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 111 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 112 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 113 |
+
)
|
| 114 |
+
(output): RobertaSelfOutput(
|
| 115 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 116 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 117 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 118 |
+
)
|
| 119 |
+
)
|
| 120 |
+
(intermediate): RobertaIntermediate(
|
| 121 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 122 |
+
)
|
| 123 |
+
(output): RobertaOutput(
|
| 124 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 125 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 126 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 127 |
+
)
|
| 128 |
+
)
|
| 129 |
+
(5): RobertaLayer(
|
| 130 |
+
(attention): RobertaAttention(
|
| 131 |
+
(self): RobertaSelfAttention(
|
| 132 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 133 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 134 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 135 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 136 |
+
)
|
| 137 |
+
(output): RobertaSelfOutput(
|
| 138 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 139 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 140 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 141 |
+
)
|
| 142 |
+
)
|
| 143 |
+
(intermediate): RobertaIntermediate(
|
| 144 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 145 |
+
)
|
| 146 |
+
(output): RobertaOutput(
|
| 147 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 148 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 149 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 150 |
+
)
|
| 151 |
+
)
|
| 152 |
+
(6): RobertaLayer(
|
| 153 |
+
(attention): RobertaAttention(
|
| 154 |
+
(self): RobertaSelfAttention(
|
| 155 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 156 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 157 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 158 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 159 |
+
)
|
| 160 |
+
(output): RobertaSelfOutput(
|
| 161 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 162 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 163 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 164 |
+
)
|
| 165 |
+
)
|
| 166 |
+
(intermediate): RobertaIntermediate(
|
| 167 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 168 |
+
)
|
| 169 |
+
(output): RobertaOutput(
|
| 170 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 171 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 172 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 173 |
+
)
|
| 174 |
+
)
|
| 175 |
+
(7): RobertaLayer(
|
| 176 |
+
(attention): RobertaAttention(
|
| 177 |
+
(self): RobertaSelfAttention(
|
| 178 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 179 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 180 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 181 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 182 |
+
)
|
| 183 |
+
(output): RobertaSelfOutput(
|
| 184 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 185 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 186 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 187 |
+
)
|
| 188 |
+
)
|
| 189 |
+
(intermediate): RobertaIntermediate(
|
| 190 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 191 |
+
)
|
| 192 |
+
(output): RobertaOutput(
|
| 193 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 194 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 195 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 196 |
+
)
|
| 197 |
+
)
|
| 198 |
+
(8): RobertaLayer(
|
| 199 |
+
(attention): RobertaAttention(
|
| 200 |
+
(self): RobertaSelfAttention(
|
| 201 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 202 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 203 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 204 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 205 |
+
)
|
| 206 |
+
(output): RobertaSelfOutput(
|
| 207 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 208 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 209 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 210 |
+
)
|
| 211 |
+
)
|
| 212 |
+
(intermediate): RobertaIntermediate(
|
| 213 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 214 |
+
)
|
| 215 |
+
(output): RobertaOutput(
|
| 216 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 217 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 218 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 219 |
+
)
|
| 220 |
+
)
|
| 221 |
+
(9): RobertaLayer(
|
| 222 |
+
(attention): RobertaAttention(
|
| 223 |
+
(self): RobertaSelfAttention(
|
| 224 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 225 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 226 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 227 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 228 |
+
)
|
| 229 |
+
(output): RobertaSelfOutput(
|
| 230 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 231 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 232 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 233 |
+
)
|
| 234 |
+
)
|
| 235 |
+
(intermediate): RobertaIntermediate(
|
| 236 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 237 |
+
)
|
| 238 |
+
(output): RobertaOutput(
|
| 239 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 240 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 241 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 242 |
+
)
|
| 243 |
+
)
|
| 244 |
+
(10): RobertaLayer(
|
| 245 |
+
(attention): RobertaAttention(
|
| 246 |
+
(self): RobertaSelfAttention(
|
| 247 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 248 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 249 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 250 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 251 |
+
)
|
| 252 |
+
(output): RobertaSelfOutput(
|
| 253 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 254 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 255 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 256 |
+
)
|
| 257 |
+
)
|
| 258 |
+
(intermediate): RobertaIntermediate(
|
| 259 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 260 |
+
)
|
| 261 |
+
(output): RobertaOutput(
|
| 262 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 263 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 264 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 265 |
+
)
|
| 266 |
+
)
|
| 267 |
+
(11): RobertaLayer(
|
| 268 |
+
(attention): RobertaAttention(
|
| 269 |
+
(self): RobertaSelfAttention(
|
| 270 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 271 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 272 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 273 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 274 |
+
)
|
| 275 |
+
(output): RobertaSelfOutput(
|
| 276 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 277 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 278 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 279 |
+
)
|
| 280 |
+
)
|
| 281 |
+
(intermediate): RobertaIntermediate(
|
| 282 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 283 |
+
)
|
| 284 |
+
(output): RobertaOutput(
|
| 285 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 286 |
+
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
|
| 287 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 288 |
+
)
|
| 289 |
+
)
|
| 290 |
+
)
|
| 291 |
+
)
|
| 292 |
+
(pooler): RobertaPooler(
|
| 293 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 294 |
+
(activation): Tanh()
|
| 295 |
+
)
|
| 296 |
+
)
|
| 297 |
+
)
|
| 298 |
+
(word_dropout): WordDropout(p=0.05)
|
| 299 |
+
(locked_dropout): LockedDropout(p=0.5)
|
| 300 |
+
(linear): Linear(in_features=768, out_features=18, bias=True)
|
| 301 |
+
(beta): 1.0
|
| 302 |
+
(weights): None
|
| 303 |
+
(weight_tensor) None
|
| 304 |
+
)"
|
| 305 |
+
2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
|
| 306 |
+
2022-02-04 12:18:14,167 Corpus: "Corpus: 126973 train + 7037 dev + 7090 test sentences"
|
| 307 |
+
2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
|
| 308 |
+
2022-02-04 12:18:14,167 Parameters:
|
| 309 |
+
2022-02-04 12:18:14,167 - learning_rate: "5e-05"
|
| 310 |
+
2022-02-04 12:18:14,167 - mini_batch_size: "16"
|
| 311 |
+
2022-02-04 12:18:14,167 - patience: "3"
|
| 312 |
+
2022-02-04 12:18:14,167 - anneal_factor: "0.5"
|
| 313 |
+
2022-02-04 12:18:14,167 - max_epochs: "10"
|
| 314 |
+
2022-02-04 12:18:14,167 - shuffle: "True"
|
| 315 |
+
2022-02-04 12:18:14,167 - train_with_dev: "False"
|
| 316 |
+
2022-02-04 12:18:14,167 - batch_growth_annealing: "False"
|
| 317 |
+
2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
|
| 318 |
+
2022-02-04 12:18:14,167 Model training base path: "resources/taggers/ner-camembert"
|
| 319 |
+
2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
|
| 320 |
+
2022-02-04 12:18:14,167 Device: cuda:0
|
| 321 |
+
2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
|
| 322 |
+
2022-02-04 12:18:14,167 Embeddings storage mode: none
|
| 323 |
+
2022-02-04 12:18:14,170 ----------------------------------------------------------------------------------------------------
|
| 324 |
+
2022-02-04 12:25:23,397 epoch 1 - iter 793/7936 - loss 1.64849782 - samples/sec: 29.56 - lr: 0.000005
|
| 325 |
+
2022-02-04 12:33:59,649 epoch 1 - iter 1586/7936 - loss 1.11222779 - samples/sec: 24.58 - lr: 0.000010
|
| 326 |
+
2022-02-04 12:41:09,132 epoch 1 - iter 2379/7936 - loss 0.85257016 - samples/sec: 29.55 - lr: 0.000015
|
| 327 |
+
2022-02-04 12:47:44,896 epoch 1 - iter 3172/7936 - loss 0.71981753 - samples/sec: 32.07 - lr: 0.000020
|
| 328 |
+
2022-02-04 12:55:15,449 epoch 1 - iter 3965/7936 - loss 0.60512907 - samples/sec: 28.16 - lr: 0.000025
|
| 329 |
+
2022-02-04 13:02:35,238 epoch 1 - iter 4758/7936 - loss 0.52903622 - samples/sec: 28.85 - lr: 0.000030
|
| 330 |
+
2022-02-04 13:09:27,012 epoch 1 - iter 5551/7936 - loss 0.48171220 - samples/sec: 30.82 - lr: 0.000035
|
| 331 |
+
2022-02-04 13:15:53,083 epoch 1 - iter 6344/7936 - loss 0.44948661 - samples/sec: 32.87 - lr: 0.000040
|
| 332 |
+
2022-02-04 13:22:02,650 epoch 1 - iter 7137/7936 - loss 0.42228564 - samples/sec: 34.34 - lr: 0.000045
|
| 333 |
+
2022-02-04 13:28:59,445 epoch 1 - iter 7930/7936 - loss 0.39366725 - samples/sec: 30.45 - lr: 0.000050
|
| 334 |
+
2022-02-04 13:29:03,026 ----------------------------------------------------------------------------------------------------
|
| 335 |
+
2022-02-04 13:29:03,028 EPOCH 1 done: loss 0.3935 - lr 0.0000500
|
| 336 |
+
2022-02-04 13:32:00,102 DEV : loss 0.038586683571338654 - f1-score (micro avg) 0.8195
|
| 337 |
+
2022-02-04 13:32:00,155 BAD EPOCHS (no improvement): 4
|
| 338 |
+
2022-02-04 13:32:00,156 ----------------------------------------------------------------------------------------------------
|
| 339 |
+
2022-02-04 13:39:12,612 epoch 2 - iter 793/7936 - loss 0.14931520 - samples/sec: 29.34 - lr: 0.000049
|
| 340 |
+
2022-02-04 13:46:36,550 epoch 2 - iter 1586/7936 - loss 0.14672871 - samples/sec: 28.58 - lr: 0.000049
|
| 341 |
+
2022-02-04 13:53:49,885 epoch 2 - iter 2379/7936 - loss 0.14547274 - samples/sec: 29.28 - lr: 0.000048
|
| 342 |
+
2022-02-04 14:01:13,739 epoch 2 - iter 3172/7936 - loss 0.14418846 - samples/sec: 28.59 - lr: 0.000048
|
| 343 |
+
2022-02-04 14:08:30,985 epoch 2 - iter 3965/7936 - loss 0.14265825 - samples/sec: 29.02 - lr: 0.000047
|
| 344 |
+
2022-02-04 14:15:46,742 epoch 2 - iter 4758/7936 - loss 0.14086599 - samples/sec: 29.12 - lr: 0.000047
|
| 345 |
+
2022-02-04 14:23:11,181 epoch 2 - iter 5551/7936 - loss 0.13927378 - samples/sec: 28.55 - lr: 0.000046
|
| 346 |
+
2022-02-04 14:30:19,706 epoch 2 - iter 6344/7936 - loss 0.13799042 - samples/sec: 29.61 - lr: 0.000046
|
| 347 |
+
2022-02-04 14:37:30,554 epoch 2 - iter 7137/7936 - loss 0.13666296 - samples/sec: 29.45 - lr: 0.000045
|
| 348 |
+
2022-02-04 14:44:52,886 epoch 2 - iter 7930/7936 - loss 0.13525042 - samples/sec: 28.69 - lr: 0.000044
|
| 349 |
+
2022-02-04 14:44:56,060 ----------------------------------------------------------------------------------------------------
|
| 350 |
+
2022-02-04 14:44:56,062 EPOCH 2 done: loss 0.1352 - lr 0.0000444
|
| 351 |
+
2022-02-04 14:47:40,950 DEV : loss 0.015217592008411884 - f1-score (micro avg) 0.9164
|
| 352 |
+
2022-02-04 14:47:41,011 BAD EPOCHS (no improvement): 4
|
| 353 |
+
2022-02-04 14:47:41,014 ----------------------------------------------------------------------------------------------------
|
| 354 |
+
2022-02-04 14:55:04,697 epoch 3 - iter 793/7936 - loss 0.11742558 - samples/sec: 28.60 - lr: 0.000044
|
| 355 |
+
2022-02-04 15:02:16,388 epoch 3 - iter 1586/7936 - loss 0.11679901 - samples/sec: 29.40 - lr: 0.000043
|
| 356 |
+
2022-02-04 15:09:29,924 epoch 3 - iter 2379/7936 - loss 0.11557918 - samples/sec: 29.27 - lr: 0.000043
|
| 357 |
+
2022-02-04 15:16:54,356 epoch 3 - iter 3172/7936 - loss 0.11469700 - samples/sec: 28.55 - lr: 0.000042
|
| 358 |
+
2022-02-04 15:24:11,817 epoch 3 - iter 3965/7936 - loss 0.11351908 - samples/sec: 29.01 - lr: 0.000042
|
| 359 |
+
2022-02-04 15:31:20,620 epoch 3 - iter 4758/7936 - loss 0.11266101 - samples/sec: 29.59 - lr: 0.000041
|
| 360 |
+
2022-02-04 15:38:42,882 epoch 3 - iter 5551/7936 - loss 0.11158730 - samples/sec: 28.69 - lr: 0.000041
|
| 361 |
+
2022-02-04 15:45:50,317 epoch 3 - iter 6344/7936 - loss 0.11067669 - samples/sec: 29.69 - lr: 0.000040
|
| 362 |
+
2022-02-04 15:53:16,035 epoch 3 - iter 7137/7936 - loss 0.10955013 - samples/sec: 28.47 - lr: 0.000039
|
| 363 |
+
2022-02-04 16:00:25,858 epoch 3 - iter 7930/7936 - loss 0.10859645 - samples/sec: 29.52 - lr: 0.000039
|
| 364 |
+
2022-02-04 16:00:29,034 ----------------------------------------------------------------------------------------------------
|
| 365 |
+
2022-02-04 16:00:29,035 EPOCH 3 done: loss 0.1086 - lr 0.0000389
|
| 366 |
+
2022-02-04 16:03:24,201 DEV : loss 0.015040190890431404 - f1-score (micro avg) 0.9276
|
| 367 |
+
2022-02-04 16:03:24,261 BAD EPOCHS (no improvement): 4
|
| 368 |
+
2022-02-04 16:03:24,262 ----------------------------------------------------------------------------------------------------
|
| 369 |
+
2022-02-04 16:10:35,356 epoch 4 - iter 793/7936 - loss 0.09491620 - samples/sec: 29.44 - lr: 0.000038
|
| 370 |
+
2022-02-04 16:17:46,476 epoch 4 - iter 1586/7936 - loss 0.09400900 - samples/sec: 29.43 - lr: 0.000038
|
| 371 |
+
2022-02-04 16:25:10,503 epoch 4 - iter 2379/7936 - loss 0.09355228 - samples/sec: 28.58 - lr: 0.000037
|
| 372 |
+
2022-02-04 16:32:21,829 epoch 4 - iter 3172/7936 - loss 0.09257257 - samples/sec: 29.42 - lr: 0.000037
|
| 373 |
+
2022-02-04 16:39:34,717 epoch 4 - iter 3965/7936 - loss 0.09178491 - samples/sec: 29.31 - lr: 0.000036
|
| 374 |
+
2022-02-04 16:46:54,536 epoch 4 - iter 4758/7936 - loss 0.09102086 - samples/sec: 28.85 - lr: 0.000036
|
| 375 |
+
2022-02-04 16:54:08,674 epoch 4 - iter 5551/7936 - loss 0.09026061 - samples/sec: 29.23 - lr: 0.000035
|
| 376 |
+
2022-02-04 17:01:24,799 epoch 4 - iter 6344/7936 - loss 0.08942621 - samples/sec: 29.10 - lr: 0.000034
|
| 377 |
+
2022-02-04 17:08:44,577 epoch 4 - iter 7137/7936 - loss 0.08868927 - samples/sec: 28.85 - lr: 0.000034
|
| 378 |
+
2022-02-04 17:15:57,678 epoch 4 - iter 7930/7936 - loss 0.08790466 - samples/sec: 29.30 - lr: 0.000033
|
| 379 |
+
2022-02-04 17:16:00,787 ----------------------------------------------------------------------------------------------------
|
| 380 |
+
2022-02-04 17:16:00,790 EPOCH 4 done: loss 0.0879 - lr 0.0000333
|
| 381 |
+
2022-02-04 17:18:55,805 DEV : loss 0.015710221603512764 - f1-score (micro avg) 0.9308
|
| 382 |
+
2022-02-04 17:18:55,865 BAD EPOCHS (no improvement): 4
|
| 383 |
+
2022-02-04 17:18:55,873 ----------------------------------------------------------------------------------------------------
|
| 384 |
+
2022-02-04 17:26:02,969 epoch 5 - iter 793/7936 - loss 0.07683748 - samples/sec: 29.71 - lr: 0.000033
|
| 385 |
+
2022-02-04 17:33:13,355 epoch 5 - iter 1586/7936 - loss 0.07621969 - samples/sec: 29.49 - lr: 0.000032
|
| 386 |
+
2022-02-04 17:40:38,247 epoch 5 - iter 2379/7936 - loss 0.07573593 - samples/sec: 28.52 - lr: 0.000032
|
| 387 |
+
2022-02-04 17:47:40,269 epoch 5 - iter 3172/7936 - loss 0.07524740 - samples/sec: 30.07 - lr: 0.000031
|
| 388 |
+
2022-02-04 17:54:59,036 epoch 5 - iter 3965/7936 - loss 0.07449799 - samples/sec: 28.92 - lr: 0.000031
|
| 389 |
+
2022-02-04 18:02:03,686 epoch 5 - iter 4758/7936 - loss 0.07405311 - samples/sec: 29.88 - lr: 0.000030
|
| 390 |
+
2022-02-04 18:09:11,646 epoch 5 - iter 5551/7936 - loss 0.07340830 - samples/sec: 29.65 - lr: 0.000029
|
| 391 |
+
2022-02-04 18:16:27,240 epoch 5 - iter 6344/7936 - loss 0.07271787 - samples/sec: 29.13 - lr: 0.000029
|
| 392 |
+
2022-02-04 18:23:29,669 epoch 5 - iter 7137/7936 - loss 0.07217288 - samples/sec: 30.04 - lr: 0.000028
|
| 393 |
+
2022-02-04 18:30:30,597 epoch 5 - iter 7930/7936 - loss 0.07166288 - samples/sec: 30.15 - lr: 0.000028
|
| 394 |
+
2022-02-04 18:30:33,919 ----------------------------------------------------------------------------------------------------
|
| 395 |
+
2022-02-04 18:30:33,920 EPOCH 5 done: loss 0.0717 - lr 0.0000278
|
| 396 |
+
2022-02-04 18:33:23,923 DEV : loss 0.017801353707909584 - f1-score (micro avg) 0.9319
|
| 397 |
+
2022-02-04 18:33:23,983 BAD EPOCHS (no improvement): 4
|
| 398 |
+
2022-02-04 18:33:23,983 ----------------------------------------------------------------------------------------------------
|
| 399 |
+
2022-02-04 18:40:28,017 epoch 6 - iter 793/7936 - loss 0.06265627 - samples/sec: 29.93 - lr: 0.000027
|
| 400 |
+
2022-02-04 18:47:46,740 epoch 6 - iter 1586/7936 - loss 0.06168821 - samples/sec: 28.92 - lr: 0.000027
|
| 401 |
+
2022-02-04 18:54:59,429 epoch 6 - iter 2379/7936 - loss 0.06137959 - samples/sec: 29.33 - lr: 0.000026
|
| 402 |
+
2022-02-04 19:02:08,367 epoch 6 - iter 3172/7936 - loss 0.06101991 - samples/sec: 29.58 - lr: 0.000026
|
| 403 |
+
2022-02-04 19:09:34,369 epoch 6 - iter 3965/7936 - loss 0.06073221 - samples/sec: 28.45 - lr: 0.000025
|
| 404 |
+
2022-02-04 19:16:53,646 epoch 6 - iter 4758/7936 - loss 0.06031513 - samples/sec: 28.89 - lr: 0.000024
|
| 405 |
+
2022-02-04 19:24:05,427 epoch 6 - iter 5551/7936 - loss 0.05997466 - samples/sec: 29.39 - lr: 0.000024
|
| 406 |
+
2022-02-04 19:31:27,470 epoch 6 - iter 6344/7936 - loss 0.05952743 - samples/sec: 28.71 - lr: 0.000023
|
| 407 |
+
2022-02-04 19:38:37,449 epoch 6 - iter 7137/7936 - loss 0.05906427 - samples/sec: 29.51 - lr: 0.000023
|
| 408 |
+
2022-02-04 19:46:02,608 epoch 6 - iter 7930/7936 - loss 0.05868560 - samples/sec: 28.51 - lr: 0.000022
|
| 409 |
+
2022-02-04 19:46:05,790 ----------------------------------------------------------------------------------------------------
|
| 410 |
+
2022-02-04 19:46:05,791 EPOCH 6 done: loss 0.0587 - lr 0.0000222
|
| 411 |
+
2022-02-04 19:48:52,058 DEV : loss 0.018429730087518692 - f1-score (micro avg) 0.9371
|
| 412 |
+
2022-02-04 19:48:52,117 BAD EPOCHS (no improvement): 4
|
| 413 |
+
2022-02-04 19:48:52,118 ----------------------------------------------------------------------------------------------------
|
| 414 |
+
2022-02-04 19:56:15,841 epoch 7 - iter 793/7936 - loss 0.05186660 - samples/sec: 28.60 - lr: 0.000022
|
| 415 |
+
2022-02-04 20:03:27,574 epoch 7 - iter 1586/7936 - loss 0.05230029 - samples/sec: 29.39 - lr: 0.000021
|
| 416 |
+
2022-02-04 20:10:42,349 epoch 7 - iter 2379/7936 - loss 0.05178480 - samples/sec: 29.19 - lr: 0.000021
|
| 417 |
+
2022-02-04 20:18:09,822 epoch 7 - iter 3172/7936 - loss 0.05114746 - samples/sec: 28.36 - lr: 0.000020
|
| 418 |
+
2022-02-04 20:25:23,574 epoch 7 - iter 3965/7936 - loss 0.05080701 - samples/sec: 29.26 - lr: 0.000019
|
| 419 |
+
2022-02-04 20:32:39,287 epoch 7 - iter 4758/7936 - loss 0.05039880 - samples/sec: 29.12 - lr: 0.000019
|
| 420 |
+
2022-02-04 20:40:04,807 epoch 7 - iter 5551/7936 - loss 0.05020234 - samples/sec: 28.48 - lr: 0.000018
|
| 421 |
+
2022-02-04 20:47:17,356 epoch 7 - iter 6344/7936 - loss 0.04984342 - samples/sec: 29.34 - lr: 0.000018
|
| 422 |
+
2022-02-04 20:54:31,673 epoch 7 - iter 7137/7936 - loss 0.04955538 - samples/sec: 29.22 - lr: 0.000017
|
| 423 |
+
2022-02-04 21:01:58,187 epoch 7 - iter 7930/7936 - loss 0.04921375 - samples/sec: 28.42 - lr: 0.000017
|
| 424 |
+
2022-02-04 21:02:01,071 ----------------------------------------------------------------------------------------------------
|
| 425 |
+
2022-02-04 21:02:01,071 EPOCH 7 done: loss 0.0492 - lr 0.0000167
|
| 426 |
+
2022-02-04 21:04:47,460 DEV : loss 0.02109825611114502 - f1-score (micro avg) 0.9362
|
| 427 |
+
2022-02-04 21:04:47,519 BAD EPOCHS (no improvement): 4
|
| 428 |
+
2022-02-04 21:04:47,519 ----------------------------------------------------------------------------------------------------
|
| 429 |
+
2022-02-04 21:12:13,992 epoch 8 - iter 793/7936 - loss 0.04468006 - samples/sec: 28.42 - lr: 0.000016
|
| 430 |
+
2022-02-04 21:19:25,811 epoch 8 - iter 1586/7936 - loss 0.04434977 - samples/sec: 29.39 - lr: 0.000016
|
| 431 |
+
2022-02-04 21:26:35,161 epoch 8 - iter 2379/7936 - loss 0.04431108 - samples/sec: 29.56 - lr: 0.000015
|
| 432 |
+
2022-02-04 21:33:55,512 epoch 8 - iter 3172/7936 - loss 0.04408371 - samples/sec: 28.82 - lr: 0.000014
|
| 433 |
+
2022-02-04 21:41:09,449 epoch 8 - iter 3965/7936 - loss 0.04390607 - samples/sec: 29.24 - lr: 0.000014
|
| 434 |
+
2022-02-04 21:48:30,449 epoch 8 - iter 4758/7936 - loss 0.04368218 - samples/sec: 28.77 - lr: 0.000013
|
| 435 |
+
2022-02-04 21:55:47,346 epoch 8 - iter 5551/7936 - loss 0.04350544 - samples/sec: 29.05 - lr: 0.000013
|
| 436 |
+
2022-02-04 22:03:02,107 epoch 8 - iter 6344/7936 - loss 0.04321482 - samples/sec: 29.19 - lr: 0.000012
|
| 437 |
+
2022-02-04 22:10:29,225 epoch 8 - iter 7137/7936 - loss 0.04299359 - samples/sec: 28.38 - lr: 0.000012
|
| 438 |
+
2022-02-04 22:17:46,915 epoch 8 - iter 7930/7936 - loss 0.04275655 - samples/sec: 28.99 - lr: 0.000011
|
| 439 |
+
2022-02-04 22:17:50,251 ----------------------------------------------------------------------------------------------------
|
| 440 |
+
2022-02-04 22:17:50,252 EPOCH 8 done: loss 0.0428 - lr 0.0000111
|
| 441 |
+
2022-02-04 22:20:46,443 DEV : loss 0.02112417109310627 - f1-score (micro avg) 0.9396
|
| 442 |
+
2022-02-04 22:20:46,502 BAD EPOCHS (no improvement): 4
|
| 443 |
+
2022-02-04 22:20:46,502 ----------------------------------------------------------------------------------------------------
|
| 444 |
+
2022-02-04 22:27:54,677 epoch 9 - iter 793/7936 - loss 0.03874630 - samples/sec: 29.64 - lr: 0.000011
|
| 445 |
+
2022-02-04 22:35:07,034 epoch 9 - iter 1586/7936 - loss 0.03916791 - samples/sec: 29.35 - lr: 0.000010
|
| 446 |
+
2022-02-04 22:42:33,861 epoch 9 - iter 2379/7936 - loss 0.03903771 - samples/sec: 28.40 - lr: 0.000009
|
| 447 |
+
2022-02-04 22:49:45,768 epoch 9 - iter 3172/7936 - loss 0.03915089 - samples/sec: 29.38 - lr: 0.000009
|
| 448 |
+
2022-02-04 22:56:49,271 epoch 9 - iter 3965/7936 - loss 0.03903752 - samples/sec: 29.96 - lr: 0.000008
|
| 449 |
+
2022-02-04 23:04:02,033 epoch 9 - iter 4758/7936 - loss 0.03886980 - samples/sec: 29.32 - lr: 0.000008
|
| 450 |
+
2022-02-04 23:11:05,006 epoch 9 - iter 5551/7936 - loss 0.03870274 - samples/sec: 30.00 - lr: 0.000007
|
| 451 |
+
2022-02-04 23:18:05,622 epoch 9 - iter 6344/7936 - loss 0.03860323 - samples/sec: 30.17 - lr: 0.000007
|
| 452 |
+
2022-02-04 23:25:20,470 epoch 9 - iter 7137/7936 - loss 0.03844156 - samples/sec: 29.18 - lr: 0.000006
|
| 453 |
+
2022-02-04 23:32:20,810 epoch 9 - iter 7930/7936 - loss 0.03839073 - samples/sec: 30.19 - lr: 0.000006
|
| 454 |
+
2022-02-04 23:32:23,941 ----------------------------------------------------------------------------------------------------
|
| 455 |
+
2022-02-04 23:32:23,942 EPOCH 9 done: loss 0.0384 - lr 0.0000056
|
| 456 |
+
2022-02-04 23:35:14,351 DEV : loss 0.02171432413160801 - f1-score (micro avg) 0.9419
|
| 457 |
+
2022-02-04 23:35:14,411 BAD EPOCHS (no improvement): 4
|
| 458 |
+
2022-02-04 23:35:14,412 ----------------------------------------------------------------------------------------------------
|
| 459 |
+
2022-02-04 23:42:16,230 epoch 10 - iter 793/7936 - loss 0.03646154 - samples/sec: 30.08 - lr: 0.000005
|
| 460 |
+
2022-02-04 23:49:27,305 epoch 10 - iter 1586/7936 - loss 0.03635515 - samples/sec: 29.44 - lr: 0.000004
|
| 461 |
+
2022-02-04 23:56:27,850 epoch 10 - iter 2379/7936 - loss 0.03662968 - samples/sec: 30.17 - lr: 0.000004
|
| 462 |
+
2022-02-05 00:03:30,598 epoch 10 - iter 3172/7936 - loss 0.03640152 - samples/sec: 30.02 - lr: 0.000003
|
| 463 |
+
2022-02-05 00:10:46,058 epoch 10 - iter 3965/7936 - loss 0.03636994 - samples/sec: 29.14 - lr: 0.000003
|
| 464 |
+
2022-02-05 00:17:50,999 epoch 10 - iter 4758/7936 - loss 0.03636800 - samples/sec: 29.86 - lr: 0.000002
|
| 465 |
+
2022-02-05 00:24:51,167 epoch 10 - iter 5551/7936 - loss 0.03625499 - samples/sec: 30.20 - lr: 0.000002
|
| 466 |
+
2022-02-05 00:32:07,970 epoch 10 - iter 6344/7936 - loss 0.03625737 - samples/sec: 29.05 - lr: 0.000001
|
| 467 |
+
2022-02-05 00:39:14,867 epoch 10 - iter 7137/7936 - loss 0.03618156 - samples/sec: 29.73 - lr: 0.000001
|
| 468 |
+
2022-02-05 00:46:17,991 epoch 10 - iter 7930/7936 - loss 0.03611184 - samples/sec: 29.99 - lr: 0.000000
|
| 469 |
+
2022-02-05 00:46:21,120 ----------------------------------------------------------------------------------------------------
|
| 470 |
+
2022-02-05 00:46:21,123 EPOCH 10 done: loss 0.0361 - lr 0.0000000
|
| 471 |
+
2022-02-05 00:49:11,421 DEV : loss 0.023424603044986725 - f1-score (micro avg) 0.9417
|
| 472 |
+
2022-02-05 00:49:11,486 BAD EPOCHS (no improvement): 4
|
| 473 |
+
2022-02-05 00:49:12,641 ----------------------------------------------------------------------------------------------------
|
| 474 |
+
2022-02-05 00:49:12,643 Testing using last state of model ...
|
| 475 |
+
2022-02-05 00:52:03,154 0.9303 0.9309 0.9306 0.8856
|
| 476 |
+
2022-02-05 00:52:03,155
|
| 477 |
+
Results:
|
| 478 |
+
- F-score (micro) 0.9306
|
| 479 |
+
- F-score (macro) 0.9057
|
| 480 |
+
- Accuracy 0.8856
|
| 481 |
+
|
| 482 |
+
By class:
|
| 483 |
+
precision recall f1-score support
|
| 484 |
+
|
| 485 |
+
pers 0.9373 0.9236 0.9304 2734
|
| 486 |
+
loc 0.9140 0.9371 0.9254 1384
|
| 487 |
+
amount 0.9840 0.9840 0.9840 250
|
| 488 |
+
time 0.9447 0.9407 0.9427 236
|
| 489 |
+
func 0.9209 0.9143 0.9176 140
|
| 490 |
+
org 0.8364 0.9388 0.8846 49
|
| 491 |
+
prod 0.7742 0.8889 0.8276 27
|
| 492 |
+
event 0.8333 0.8333 0.8333 12
|
| 493 |
+
|
| 494 |
+
micro avg 0.9303 0.9309 0.9306 4832
|
| 495 |
+
macro avg 0.8931 0.9201 0.9057 4832
|
| 496 |
+
weighted avg 0.9307 0.9309 0.9307 4832
|
| 497 |
+
samples avg 0.8856 0.8856 0.8856 4832
|
| 498 |
+
|
| 499 |
+
2022-02-05 00:52:03,155 ----------------------------------------------------------------------------------------------------
|
weights.txt
ADDED
|
File without changes
|