lots-o commited on
Commit
dd2ce20
ยท
verified ยท
1 Parent(s): e18d5f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +333 -1
README.md CHANGED
@@ -7,4 +7,336 @@ tags:
7
  pipeline_tag: token-classification
8
  base_model:
9
  - team-lucid/deberta-v3-small-korean
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pipeline_tag: token-classification
8
  base_model:
9
  - team-lucid/deberta-v3-small-korean
10
+ ---
11
+
12
+
13
+
14
+ ## Intro
15
+
16
+
17
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64993e268242893df52102a8/cZo8s2oHN8N37iU006XZC.png)
18
+
19
+
20
+
21
+ GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoders (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
22
+
23
+ This particular version utilize bi-encoder architecture, where textual encoder isย [team-lucid/DeBERTa v3 small](team-lucid/deberta-v3-small-korean)ย and entity label encoder is sentence transformer -ย [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3).
24
+
25
+ Such architecture brings several advantages over uni-encoder GLiNER:
26
+
27
+ - An unlimited amount of entities can be recognized at a single time;
28
+ - Faster inference if entity embeddings are preprocessed;
29
+ - Better generalization to unseen entities;
30
+
31
+ However, it has some drawbacks such as a lack of inter-label interactions that make it hard for the model to disambiguate semantically similar but contextually different entities.
32
+
33
+
34
+ - Paper:ย [https://arxiv.org/abs/2311.08526](https://arxiv.org/abs/2311.08526)
35
+ - Repository:ย [https://github.com/urchade/GLiNER](https://github.com/urchade/GLiNER)
36
+ - Service: https://github.com/henrikalbihn/gliner-as-a-service
37
+
38
+ ---
39
+ ## Installation & Usage
40
+
41
+ Install or update the gliner package:
42
+
43
+ ```bash
44
+ pip install gliner>=0.2.16
45
+ pip install python-mecab-ko
46
+ ```
47
+
48
+ Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model usingย `GLiNER.from_pretrained`ย and predict entities withย `predict_entities`.
49
+
50
+ ```python
51
+ from gliner import GLiNER
52
+
53
+ model = GLiNER.from_pretrained("lots-o/gliner-bi-ko-small-v1")
54
+
55
+ text = """ํฌ๋ฆฌ์Šคํ† ํผ ๋†€๋ž€(Christopher Nolan) ์€ ์˜๊ตญ์˜ ์˜ํ™” ๊ฐ๋…, ๊ฐ๋ณธ๊ฐ€, ์˜ํ™” ํ”„๋กœ๋“€์„œ์ด๋‹ค. ๊ทธ์˜ ๋Œ€ํ‘œ์ž‘์œผ๋กœ๋Š” 2008๋…„ ๊ฐœ๋ด‰ํ•œ ใ€Š๋‹คํฌ ๋‚˜์ดํŠธใ€‹ ์‹œ๋ฆฌ์ฆˆ๊ฐ€ ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ใ€Š๋‹คํฌ ๋‚˜์ดํŠธใ€‹(2008)์˜ ๊ฐ๋…์œผ๋กœ ๊ฐ€์žฅ ์œ ๋ช…ํ•˜๋‹ค. ์ด ์˜ํ™”๋Š” ๋ฐฐํŠธ๋งจ ์บ๋ฆญํ„ฐ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ํ•œ ์Šˆํผํžˆ์–ด๋กœ ์˜ํ™”๋กœ, ํžˆ์Šค ๋ ˆ์ €์˜ ์กฐ์ปค ์—ญํ• ์ด ํฐ ์ธ๊ธฐ๋ฅผ ๋Œ์—ˆ๋‹ค. ๋˜ํ•œ, 2010๋…„์— ๊ฐœ๋ด‰ํ•œ ใ€Š์ธ์…‰์…˜ใ€‹(2010)์€ ๋ณต์žกํ•œ ์‹œ๊ฐ„๊ณผ ๊ฟˆ์˜ ๊ฐœ๋…์„ ๋‹ค๋ฃฌ SF ์˜ํ™”๋กœ, ์˜ํ™” ์ œ์ž‘ ๋ฐฉ์‹๊ณผ ์Šคํ† ๋ฆฌ ์ „๊ฐœ์—์„œ ํ˜์‹ ์ ์ธ ์ ‘๊ทผ์„ ์„ ๋ณด์˜€๋‹ค. ํฌ๋ฆฌ์Šคํ† ํผ ๋†€๋ž€์€ ์‹œ๊ฐ„ ์—ฌํ–‰๊ณผ ๋‹ค์ฐจ์›์  ์ด์•ผ๊ธฐ๋ฅผ ํƒ๊ตฌํ•˜๋Š” ์˜ํ™”๋“ค์„ ํ†ตํ•ด ํ˜„๋Œ€ ์˜ํ™”๊ณ„์—์„œ ์ค‘์š”ํ•œ ๊ฐ๋…์œผ๋กœ ์ž๋ฆฌ๋งค๊น€ํ–ˆ๋‹ค.
56
+ """
57
+
58
+ labels = [
59
+ "์˜ํ™”/์†Œ์„ค ์ž‘ํ’ˆ๋ช…",
60
+ "์‚ฌ๋žŒ ์ด๋ฆ„",
61
+ "์บ๋ฆญํ„ฐ ์ด๋ฆ„",
62
+ "์ง์—…๋ช…",
63
+ "๋‚ ์งœ_์—ฐ(๋…„)",
64
+ "๋‚ ์งœ_์ผ",
65
+ "๋‚ ์งœ_๋‹ฌ(์›”)",
66
+ "๊ตญ๊ฐ€"
67
+ ]
68
+
69
+ entities = model.predict_entities(text, labels, threshold=0.2)
70
+
71
+ for entity in entities:
72
+ print(entity["text"], "=>", entity["label"])
73
+ ```
74
+
75
+ ```
76
+ ํฌ๋ฆฌ์Šคํ† ํผ ๋†€๋ž€ => ์‚ฌ๋žŒ ์ด๋ฆ„
77
+ Christopher Nolan => ์‚ฌ๋žŒ ์ด๋ฆ„
78
+ ์˜๊ตญ => ๊ตญ๊ฐ€
79
+ ์˜ํ™” ๊ฐ๋… => ์ง์—…๋ช…
80
+ ๊ฐ๋ณธ๊ฐ€ => ์ง์—…๋ช…
81
+ ์˜ํ™” ํ”„๋กœ๋“€์„œ => ์ง์—…๋ช…
82
+ 2008๋…„ => ๋‚ ์งœ_์—ฐ(๋…„)
83
+ ๋‹คํฌ ๋‚˜์ดํŠธ => ์˜ํ™”/์†Œ์„ค ์ž‘ํ’ˆ๋ช…
84
+ ๋‹คํฌ ๋‚˜์ดํŠธ => ์˜ํ™”/์†Œ์„ค ์ž‘ํ’ˆ๋ช…
85
+ 2008 => ๋‚ ์งœ_์—ฐ(๋…„)
86
+ ๊ฐ๋… => ์ง์—…๋ช…
87
+ ๋ฐฐํŠธ๋งจ => ์บ๋ฆญํ„ฐ ์ด๋ฆ„
88
+ ํžˆ์Šค ๋ ˆ์ € => ์‚ฌ๋žŒ ์ด๋ฆ„
89
+ ์กฐ์ปค => ์บ๋ฆญํ„ฐ ์ด๋ฆ„
90
+ 2010๋…„ => ๋‚ ์งœ_์—ฐ(๋…„)
91
+ ์ธ์…‰์…˜ => ์˜ํ™”/์†Œ์„ค ์ž‘ํ’ˆ๋ช…
92
+ 2010 => ๋‚ ์งœ_์—ฐ(๋…„)
93
+ ํฌ๋ฆฌ์Šคํ† ํผ ๋†€๋ž€ => ์‚ฌ๋žŒ ์ด๋ฆ„
94
+ ๊ฐ๋… => ์ง์—…๋ช…
95
+ ```
96
+
97
+ If you have a large amount of entities and want to pre-embed them, please, refer to the following code snippet:
98
+
99
+ ```python
100
+ labels = ["your entities"]
101
+ texts = ["your texts"]
102
+
103
+ entity_embeddings = model.encode_labels(labels, batch_size = 8)
104
+
105
+ outputs = model.batch_predict_with_embeds(texts, entity_embeddings, labels)
106
+ ```
107
+
108
+
109
+ ---
110
+
111
+ ## Dataset
112
+ - [๊ตญ๋ฆฝ๊ตญ์–ด์› ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜](https://kli.korean.go.kr/corpus/main/requestMain.do?lang=ko)
113
+ - [ํ•œ๊ตญ์–ด ์ค‘์ฒฉ ๊ฐœ์ฒด๋ช… ๋ง๋ญ‰์น˜(Korean Nested Named Entity Corpus)](https://github.com/korean-named-entity/konne)
114
+
115
+ [TTA 150](https://www.korean.go.kr/front/reportData/reportDataView.do?mn_id=207&searchOrder=date&report_seq=1078&pageIndex=1)
116
+
117
+ ```python
118
+ entity_type_mapping = {
119
+ "PS": {
120
+ "PS_NAME": "์ธ๋ฌผ_์‚ฌ๋žŒ",
121
+ "PS_CHARACTER": "์ธ๋ฌผ_๊ฐ€์ƒ ์บ๋ฆญํ„ฐ",
122
+ "PS_PET": "์ธ๋ฌผ_๋ฐ˜๋ ค๋™๋ฌผ",
123
+ },
124
+ "FD": {
125
+ "FD_SCIENCE": "ํ•™๋ฌธ ๋ถ„์•ผ_๊ณผํ•™",
126
+ "FD_SOCIAL_SCIENCE": "ํ•™๋ฌธ ๋ถ„์•ผ_์‚ฌํšŒ๊ณผํ•™",
127
+ "FD_MEDICINE": "ํ•™๋ฌธ ๋ถ„์•ผ_์˜ํ•™",
128
+ "FD_ART": "ํ•™๋ฌธ ๋ถ„์•ผ_์˜ˆ์ˆ ",
129
+ "FD_HUMANITIES": "ํ•™๋ฌธ ๋ถ„์•ผ_์ธ๋ฌธํ•™",
130
+ "FD_OTHERS": "ํ•™๋ฌธ ๋ถ„์•ผ_๊ธฐํƒ€",
131
+ },
132
+ "TR": {
133
+ "TR_SCIENCE": "์ด๋ก _๊ณผํ•™",
134
+ "TR_SOCIAL_SCIENCE": "์ด๋ก _์‚ฌํšŒ๊ณผํ•™",
135
+ "TR_MEDICINE": "์ด๋ก _์˜ํ•™",
136
+ "TR_ART": "์ด๋ก _์˜ˆ์ˆ ",
137
+ "TR_HUMANITIES": "์ด๋ก _์ฒ ํ•™/์–ธ์–ด/์—ญ์‚ฌ",
138
+ "TR_OTHERS": "์ด๋ก _๊ธฐํƒ€",
139
+ },
140
+ "AF": {
141
+ "AF_BUILDING": "์ธ๊ณต๋ฌผ_๊ฑด์ถ•๋ฌผ/ํ† ๋ชฉ๊ฑด์„ค๋ฌผ",
142
+ "AF_CULTURAL_ASSET": "์ธ๊ณต๋ฌผ_๋ฌธํ™”์žฌ",
143
+ "AF_ROAD": "์ธ๊ณต๋ฌผ_๋„๋กœ/์ฒ ๋กœ",
144
+ "AF_TRANSPORT": "์ธ๊ณต๋ฌผ_๊ตํ†ต์ˆ˜๋‹จ/์šด์†ก์ˆ˜๋‹จ",
145
+ "AF_MUSICAL_INSTRUMENT": "์ธ๊ณต๋ฌผ_์•…๊ธฐ",
146
+ "AF_WEAPON": "์ธ๊ณต๋ฌผ_๋ฌด๊ธฐ",
147
+ "AFA_DOCUMENT": "์ธ๊ณต๋ฌผ_๋„์„œ/์„œ์  ์ž‘ํ’ˆ๋ช…",
148
+ "AFA_PERFORMANCE": "์ธ๊ณต๋ฌผ_์ถค/๊ณต์—ฐ/์—ฐ๊ทน ์ž‘ํ’ˆ๋ช…",
149
+ "AFA_VIDEO": "์ธ๊ณต๋ฌผ_์˜ํ™”/TV ํ”„๋กœ๊ทธ๋žจ",
150
+ "AFA_ART_CRAFT": "์ธ๊ณต๋ฌผ_๋ฏธ์ˆ /์กฐํ˜• ์ž‘ํ’ˆ๋ช…",
151
+ "AFA_MUSIC": "์ธ๊ณต๋ฌผ_์Œ์•… ์ž‘ํ’ˆ๋ช…",
152
+ "AFW_SERVICE_PRODUCTS": "์ธ๊ณต๋ฌผ_์„œ๋น„์Šค ์ƒํ’ˆ",
153
+ "AFW_OTHER_PRODUCTS": "์ธ๊ณต๋ฌผ_๊ธฐํƒ€ ์ƒํ’ˆ",
154
+ },
155
+ "OG": {
156
+ "OGG_ECONOMY": "๊ธฐ๊ด€_๊ฒฝ์ œ",
157
+ "OGG_EDUCATION": "๊ธฐ๊ด€_๊ต์œก",
158
+ "OGG_MILITARY": "๊ธฐ๊ด€_๊ตฐ์‚ฌ",
159
+ "OGG_MEDIA": "๊ธฐ๊ด€_๋ฏธ๋””์–ด",
160
+ "OGG_SPORTS": "๊ธฐ๊ด€_์Šคํฌ์ธ ",
161
+ "OGG_ART": "๊ธฐ๊ด€_์˜ˆ์ˆ ",
162
+ "OGG_MEDICINE": "๊ธฐ๊ด€_์˜๋ฃŒ",
163
+ "OGG_RELIGION": "๊ธฐ๊ด€_์ข…๊ต",
164
+ "OGG_SCIENCE": "๊ธฐ๊ด€_๊ณผํ•™",
165
+ "OGG_LIBRARY": "๊ธฐ๊ด€_๋„์„œ๊ด€",
166
+ "OGG_LAW": "๊ธฐ๊ด€_๋ฒ•๋ฅ ",
167
+ "OGG_POLITICS": "๊ธฐ๊ด€_์ •๋ถ€/๊ณต๊ณต",
168
+ "OGG_FOOD": "๊ธฐ๊ด€_์Œ์‹ ์—…์ฒด",
169
+ "OGG_HOTEL": "๊ธฐ๊ด€_์ˆ™๋ฐ• ์—…์ฒด",
170
+ "OGG_OTHERS": "๊ธฐ๊ด€_๊ธฐํƒ€",
171
+ },
172
+ "LC": {
173
+ "LCP_COUNTRY": "์žฅ์†Œ_๊ตญ๊ฐ€",
174
+ "LCP_PROVINCE": "์žฅ์†Œ_๋„/์ฃผ ์ง€์—ญ",
175
+ "LCP_COUNTY": "์žฅ์†Œ_์„ธ๋ถ€ ํ–‰์ •๊ตฌ์—ญ",
176
+ "LCP_CITY": "์žฅ์†Œ_๋„์‹œ",
177
+ "LCP_CAPITALCITY": "์žฅ์†Œ_์ˆ˜๋„",
178
+ "LCG_RIVER": "์žฅ์†Œ_๊ฐ•/ํ˜ธ์ˆ˜",
179
+ "LCG_OCEAN": "์žฅ์†Œ_๋ฐ”๋‹ค",
180
+ "LCG_BAY": "์žฅ์†Œ_๋ฐ˜๋„/๋งŒ",
181
+ "LCG_MOUNTAIN": "์žฅ์†Œ_์‚ฐ/์‚ฐ๋งฅ",
182
+ "LCG_ISLAND": "์žฅ์†Œ_์„ฌ",
183
+ "LCG_CONTINENT": "์žฅ์†Œ_๋Œ€๋ฅ™",
184
+ "LC_SPACE": "์žฅ์†Œ_์ฒœ์ฒด",
185
+ "LC_OTHERS": "์žฅ์†Œ_๊ธฐํƒ€",
186
+ },
187
+ "CV": {
188
+ "CV_CULTURE": "๋ฌธ๋ช…_๋ฌธ๋ช…/๋ฌธํ™”",
189
+ "CV_TRIBE": "๋ฌธ๋ช…_๋ฏผ์กฑ/์ข…์กฑ",
190
+ "CV_LANGUAGE": "๋ฌธ๋ช…_์–ธ์–ด",
191
+ "CV_POLICY": "๋ฌธ๋ช…_์ œ๋„/์ •์ฑ…",
192
+ "CV_LAW": "๋ฌธ๋ช…_๋ฒ•/๋ฒ•๋ฅ ",
193
+ "CV_CURRENCY": "๋ฌธ๋ช…_ํ†ตํ™”",
194
+ "CV_TAX": "๋ฌธ๋ช…_์กฐ์„ธ",
195
+ "CV_FUNDS": "๋ฌธ๋ช…_์—ฐ๊ธˆ/๊ธฐ๊ธˆ",
196
+ "CV_ART": "๋ฌธ๋ช…_์˜ˆ์ˆ ",
197
+ "CV_SPORTS": "๋ฌธ๋ช…_์Šคํฌ์ธ ",
198
+ "CV_SPORTS_POSITION": "๋ฌธ๋ช…_์Šคํฌ์ธ  ํฌ์ง€์…˜",
199
+ "CV_SPORTS_INST": "๋ฌธ๋ช…_์Šคํฌ์ธ  ์šฉํ’ˆ/๋„๊ตฌ",
200
+ "CV_PRIZE": "๋ฌธ๋ช…_์ƒ/ํ›ˆ์žฅ",
201
+ "CV_RELATION": "๋ฌธ๋ช…_๊ฐ€์กฑ/์นœ์กฑ ๊ด€๊ณ„",
202
+ "CV_OCCUPATION": "๋ฌธ๋ช…_์ง์—…",
203
+ "CV_POSITION": "๋ฌธ๋ช…_์ง์œ„/์ง์ฑ…",
204
+ "CV_FOOD": "๋ฌธ๋ช…_์Œ์‹",
205
+ "CV_DRINK": "๋ฌธ๋ช…_์Œ๋ฃŒ/์ˆ ",
206
+ "CV_FOOD_STYLE": "๋ฌธ๋ช…_์Œ์‹ ์œ ํ˜•",
207
+ "CV_CLOTHING": "๋ฌธ๋ช…_์˜๋ณต/์„ฌ์œ ",
208
+ "CV_BUILDING_TYPE": "๋ฌธ๋ช…_๊ฑด์ถ• ์–‘์‹",
209
+ },
210
+ "DT": {
211
+ "DT_DURATION": "๋‚ ์งœ_๊ธฐ๊ฐ„",
212
+ "DT_DAY": "๋‚ ์งœ_์ผ",
213
+ "DT_WEEK": "๋‚ ์งœ_์ฃผ(์ฃผ์ฐจ)",
214
+ "DT_MONTH": "๋‚ ์งœ_๋‹ฌ(์›”)",
215
+ "DT_YEAR": "๋‚ ์งœ_์—ฐ(๋…„)",
216
+ "DT_SEASON": "๋‚ ์งœ_๊ณ„์ ˆ",
217
+ "DT_GEOAGE": "๋‚ ์งœ_์ง€์งˆ์‹œ๋Œ€",
218
+ "DT_DYNASTY": "๋‚ ์งœ_์™•์กฐ์‹œ๋Œ€",
219
+ "DT_OTHERS": "๋‚ ์งœ_๊ธฐํƒ€",
220
+ },
221
+ "TI": {
222
+ "TI_DURATION": "์‹œ๊ฐ„_๊ธฐ๊ฐ„",
223
+ "TI_HOUR": "์‹œ๊ฐ„_์‹œ๊ฐ(์‹œ)",
224
+ "TI_MINUTE": "์‹œ๊ฐ„_๋ถ„",
225
+ "TI_SECOND": "์‹œ๊ฐ„_์ดˆ",
226
+ "TI_OTHERS": "์‹œ๊ฐ„_๊ธฐํƒ€",
227
+ },
228
+ "QT": {
229
+ "QT_AGE": "์ˆ˜๋Ÿ‰_๋‚˜์ด",
230
+ "QT_SIZE": "์ˆ˜๋Ÿ‰_๋„“์ด/๋ฉด์ ",
231
+ "QT_LENGTH": "์ˆ˜๋Ÿ‰_๊ธธ์ด/๊ฑฐ๋ฆฌ",
232
+ "QT_COUNT": "์ˆ˜๋Ÿ‰_์ˆ˜๋Ÿ‰/๋นˆ๋„",
233
+ "QT_MAN_COUNT": "์ˆ˜๋Ÿ‰_์ธ์›์ˆ˜",
234
+ "QT_WEIGHT": "์ˆ˜๋Ÿ‰_๋ฌด๊ฒŒ",
235
+ "QT_PERCENTAGE": "์ˆ˜๋Ÿ‰_๋ฐฑ๋ถ„์œจ",
236
+ "QT_SPEED": "์ˆ˜๋Ÿ‰_์†๋„",
237
+ "QT_TEMPERATURE": "์ˆ˜๋Ÿ‰_์˜จ๋„",
238
+ "QT_VOLUME": "์ˆ˜๋Ÿ‰_๋ถ€ํ”ผ",
239
+ "QT_ORDER": "์ˆ˜๋Ÿ‰_์ˆœ์„œ",
240
+ "QT_PRICE": "์ˆ˜๋Ÿ‰_๊ธˆ์•ก",
241
+ "QT_PHONE": "์ˆ˜๋Ÿ‰_์ „ํ™”๋ฒˆํ˜ธ",
242
+ "QT_SPORTS": "์ˆ˜๋Ÿ‰_์Šคํฌ์ธ  ์ˆ˜๋Ÿ‰",
243
+ "QT_CHANNEL": "์ˆ˜๋Ÿ‰_์ฑ„๋„ ๋ฒˆํ˜ธ",
244
+ "QT_ALBUM": "์ˆ˜๋Ÿ‰_์•จ๋ฒ” ์ˆ˜๋Ÿ‰",
245
+ "QT_ADDRESS": "์ˆ˜๋Ÿ‰_์ฃผ์†Œ ๊ด€๋ จ ์ˆซ์ž",
246
+ "QT_OTHERS": "์ˆ˜๋Ÿ‰_๊ธฐํƒ€ ์ˆ˜๋Ÿ‰",
247
+ },
248
+ "EV": {
249
+ "EV_ACTIVITY": "์‚ฌ๊ฑด_์‚ฌํšŒ์šด๋™/์„ ์–ธ",
250
+ "EV_WAR_REVOLUTION": "์‚ฌ๊ฑด_์ „์Ÿ/ํ˜๋ช…",
251
+ "EV_SPORTS": "์‚ฌ๊ฑด_์Šคํฌ์ธ  ํ–‰์‚ฌ",
252
+ "EV_FESTIVAL": "์‚ฌ๊ฑด_์ถ•์ œ/์˜ํ™”์ œ",
253
+ "EV_OTHERS": "์‚ฌ๊ฑด_๊ธฐํƒ€",
254
+ },
255
+ "AM": {
256
+ "AM_INSECT": "๋™๋ฌผ_๊ณค์ถฉ",
257
+ "AM_BIRD": "๋™๋ฌผ_์กฐ๋ฅ˜",
258
+ "AM_FISH": "๋™๋ฌผ_์–ด๋ฅ˜",
259
+ "AM_MAMMALIA": "๋™๋ฌผ_ํฌ์œ ๋ฅ˜",
260
+ "AM_AMPHIBIA": "๋™๋ฌผ_์–‘์„œ๋ฅ˜",
261
+ "AM_REPTILIA": "๋™๋ฌผ_ํŒŒ์ถฉ๋ฅ˜",
262
+ "AM_TYPE": "๋™๋ฌผ_๋ถ„๋ฅ˜๋ช…",
263
+ "AM_PART": "๋™๋ฌผ_๋ถ€์œ„๋ช…",
264
+ "AM_OTHERS": "๋™๋ฌผ_๊ธฐํƒ€",
265
+ },
266
+ "PT": {
267
+ "PT_FRUIT": "์‹๋ฌผ_๊ณผ์ผ/์—ด๋งค",
268
+ "PT_FLOWER": "์‹๋ฌผ_๊ฝƒ",
269
+ "PT_TREE": "์‹๋ฌผ_๋‚˜๋ฌด",
270
+ "PT_GRASS": "์‹๋ฌผ_ํ’€",
271
+ "PT_TYPE": "์‹๋ฌผ_๋ถ„๋ฅ˜๋ช…",
272
+ "PT_PART": "์‹๋ฌผ_๋ถ€์œ„๋ช…",
273
+ "PT_OTHERS": "์‹๋ฌผ_๊ธฐํƒ€",
274
+ },
275
+ "MT": {
276
+ "MT_ELEMENT": "๋ฌผ์งˆ_์›์†Œ",
277
+ "MT_METAL": "๋ฌผ์งˆ_๊ธˆ์†",
278
+ "MT_ROCK": "๋ฌผ์งˆ_์•”์„",
279
+ "MT_CHEMICAL": "๋ฌผ์งˆ_ํ™”ํ•™",
280
+ },
281
+ "TM": {
282
+ "TM_COLOR": "์šฉ์–ด_์ƒ‰๊น”",
283
+ "TM_DIRECTION": "์šฉ์–ด_๋ฐฉํ–ฅ",
284
+ "TM_CLIMATE": "์šฉ์–ด_๊ธฐํ›„ ์ง€์—ญ",
285
+ "TM_SHAPE": "์šฉ์–ด_๋ชจ์–‘/ํ˜•ํƒœ",
286
+ "TM_CELL_TISSUE_ORGAN": "์šฉ์–ด_์„ธํฌ/์กฐ์ง/๊ธฐ๊ด€",
287
+ "TMM_DISEASE": "์šฉ์–ด_์ฆ์ƒ/์งˆ๋ณ‘",
288
+ "TMM_DRUG": "์šฉ์–ด_์•ฝํ’ˆ",
289
+ "TMI_HW": "์šฉ์–ด_IT ํ•˜๋“œ์›จ์–ด",
290
+ "TMI_SW": "์šฉ์–ด_IT ์†Œํ”„ํŠธ์›จ์–ด",
291
+ "TMI_SITE": "์šฉ์–ด_URL ์ฃผ์†Œ",
292
+ "TMI_EMAIL": "์šฉ์–ด_์ด๋ฉ”์ผ ์ฃผ์†Œ",
293
+ "TMI_MODEL": "์šฉ์–ด_์ œํ’ˆ ๋ชจ๋ธ๋ช…",
294
+ "TMI_SERVICE": "์šฉ์–ด_IT ์„œ๋น„์Šค",
295
+ "TMI_PROJECT": "์šฉ์–ด_ํ”„๋กœ์ ํŠธ",
296
+ "TMIG_GENRE": "์šฉ์–ด_๊ฒŒ์ž„ ์žฅ๋ฅด",
297
+ "TM_SPORTS": "์šฉ์–ด_์Šคํฌ์ธ ",
298
+ },
299
+ }
300
+ ```
301
+ ## Evaluation
302
+
303
+
304
+ Evaluate with theย [konne dev set](https://github.com/korean-named-entity/konne) :
305
+ The evaluation results presented in the table below, except for the values I provided, were derived from the following source:ย [taeminlee/gliner_ko.](https://huggingface.co/taeminlee/gliner_ko)
306
+
307
+ | Model | Precision(P) | Recall(R) | F1 |
308
+ | :----------------------------: | :----------: | :-------: | :--------: |
309
+ | gliner-bi-ko-small-v1 (t=0.5) | 81.53% | 74.16% | 77.67% |
310
+ | gliner-bi-ko-xlarge-v1 (t=0.5) | **84.73%** | 77.71% | **81.07%** |
311
+ | Gliner-ko (t=0.5) | 72.51% | 79.82% | 75.99% |
312
+ | Gliner Large-v2 (t=0.5) | 34.33% | 19.50% | 24.87% |
313
+ | Gliner Multi (t=0.5) | 40.94% | 34.18% | 37.26% |
314
+ | Pororo | 70.25% | 57.94% | 63.50% |
315
+
316
+
317
+
318
+ ---
319
+
320
+ ## Citation
321
+ ```bibtex
322
+ @misc{gliner_bi_ko_small_v1,
323
+ title={gliner-bi-ko-small-v1},
324
+ author={Gihwan Kim},
325
+ year={2025},
326
+ url={https://huggingface.co/lots-o/gliner-bi-ko-small-v1}
327
+ publisher={Hugging Face}
328
+ }
329
+
330
+ ```
331
+
332
+ ```bibtex
333
+ @misc{zaratiana2023gliner,
334
+ title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
335
+ author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
336
+ year={2023},
337
+ eprint={2311.08526},
338
+ archivePrefix={arXiv},
339
+ primaryClass={cs.CL}
340
+ }
341
+ ```
342
+