1Teng commited on
Commit
4539ad2
·
verified ·
1 Parent(s): 9d56363

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +191 -44
README.md CHANGED
@@ -14,76 +14,83 @@ tags:
14
 
15
  # HaS_4.0_0.6B_q0f16-MLC
16
 
17
- This repository contains deployable artifacts (weight shards, tokenizer, and inference library) for a Qwen3 0.6B chat model compiled via MLC. The model uses the ChatML dialogue template and FP16 precision (q0f16), with a 4096-token context window.
 
 
 
 
 
 
 
18
 
19
  ---
20
 
21
  ## Directory Structure
22
 
23
- `mlc-chat-config.json`: MLC chat model configuration (template, default sampling values, context size, etc.)
24
- `tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, `added_tokens.json`: tokenizer-related files
25
- `ndarray-cache.json`: weight shard index and checksum info
26
- `params_shard_*.bin`: weight shards (FP16)
27
- `configuration.json`: general metadata (no manual changes needed)
28
 
29
  ---
30
 
31
  ## Model Specs
32
 
33
  - Model type: Qwen3 (`model_type: qwen3`)
34
- - Approximate parameter scale: 0.6B (total weights ~1.19 GB FP16)
35
- - Precision and quantization: FP16 (`quantization: q0f16`, BitsPerParam = 16)
36
  - Architecture:
37
  - Layers `num_hidden_layers`: 28
38
  - Hidden size `hidden_size`: 1024
39
  - Intermediate size `intermediate_size`: 3072 (SwiGLU/SiLU activation)
40
- - Attention heads `num_attention_heads`: 16; KV heads `num_key_value_heads`: 8
41
  - Vocab size `vocab_size`: 151,936
42
  - Context window: 4096 (`context_window_size`)
43
- - Dialogue template: ChatML (`conv_template: chatml`)
44
 
45
- Note: `model_max_length` in `tokenizer_config.json` may be larger than the runtime window, but actual inference uses `context_window_size` from `mlc-chat-config.json`.
46
 
47
  ---
48
 
49
  ## Quick Start (Local Inference)
50
 
51
- Using the MLC LLM CLI, assuming you are at the repository root.
52
 
53
  ### 1) Install Dependencies
54
 
55
- - Python 3.10+ recommended. Install MLC LLM and related runtimes:
56
 
57
  ```bash
58
  pip install -U mlc-llm mlc-ai
59
  # For latest features, you can try: pip install --pre -U mlc-llm-nightly mlc-ai-nightly
60
  ```
61
 
62
- Use the official MLC docs and your hardware to install the appropriate backend (Metal/CUDA/Vulkan/CPU). The `model.so` included in this repository targets macOS/Metal.
63
 
64
- ### 2) Chat Directly (CLI)
65
 
66
  ```bash
67
  mlc_llm chat --model resolve/main
68
- # On Apple Silicon, if you need to specify explicitly:
69
  # mlc_llm chat --model resolve/main --device metal
70
  ```
71
 
72
- Once in the interactive shell, type in Chinese or English to chat.
73
 
74
- ### 3) Start a Local Service (Optional)
75
 
76
  ```bash
77
  mlc_llm serve --model resolve/main --host 127.0.0.1 --port 8000
78
  ```
79
 
80
- Refer to the documentation of your installed `mlc-llm` version for specific APIs and options.
81
 
82
  ---
83
 
84
  ## ChatML Prompt Format
85
 
86
- This model uses ChatML. Conversations are composed of `system`/`user`/`assistant` roles with separators. If you construct raw prompts directly (bypassing a chat frontend), you can use:
87
 
88
  ```
89
  <|im_start|>system
@@ -93,46 +100,186 @@ Hello, please introduce yourself.<|im_end|>
93
  <|im_start|>assistant
94
  ```
95
 
96
- - Each message starts with `<|im_start|>role` and ends with `<|im_end|>` followed by a newline.
97
- - Generation continues from the final `assistant` line and stops at `<|im_end|>` or stop tokens.
98
- - You can adjust default `system_message`, stop strings, and sampling parameters in `resolve/main/mlc-chat-config.json` (back up before modifying).
99
 
100
  ---
101
 
102
- ## Resources and Performance Tips
103
 
104
- - Weight size is about 1.2 GB (FP16). Runtime also needs KV cache and temporary buffers.
105
- - On Apple Silicon, at least 8 GB unified memory is recommended. If you encounter OOM:
106
  - Reduce `context_window_size` or input length
107
  - Lower `max_new_tokens` / sampling temperature
108
- - Disable concurrent sessions or reduce batch size
109
 
110
  ---
111
 
112
- ## Compatibility and Porting
113
 
114
- - Current artifact: `model.so` targets macOS arm64 (Metal).
115
- - Other platforms (Linux/CUDA, Windows/Vulkan, Android, WebGPU, etc.) require recompiling the same upstream model with the MLC toolchain to produce backend-specific artifacts (`model.so/.wasm`, weight shards, and configuration).
116
 
117
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
 
119
- ## Troubleshooting (FAQ)
 
 
 
 
 
 
 
 
 
 
120
 
121
- - Unable to load `model.so` / architecture mismatch:
122
- - Ensure you are on macOS with Apple Silicon and using Python/dependencies that match the platform.
123
- - CLI cannot find the model:
124
- - Make sure `--model` points to the directory containing `mlc-chat-config.json` (use `resolve/main` in this repository).
125
- - Slow initial response:
126
- - The first load maps multiple shard files and initializes the graph; this is expected.
127
- - Unstable output quality:
128
- - Tune temperature (e.g., 0.7), `top_p` (e.g., 0.9), or reduce instruction complexity; a 0.6B model is more suitable for lightweight tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
  ---
131
 
132
- ## License and Source
133
 
134
- - This repository only contains compiled inference artifacts and configuration; it does not include training code.
135
- - Model weights and licensing should follow the upstream Qwen3 model and your authorized terms. For commercial use, read and comply with the upstream licenses.
136
  - Use and redistribution of MLC/TVM must follow their respective open-source licenses.
137
 
138
- ---
 
14
 
15
  # HaS_4.0_0.6B_q0f16-MLC
16
 
17
+ HaS is a data de-identification model developed by Tencent Xuanwu Lab. Its core goal is to identify sensitive entities in text (such as names, addresses, and phone numbers) and securely replace them with standardized anonymization tags, preserving privacy while maintaining the original structure and semantic coherence. HaS supports 22 languages including Chinese/English/French/Japanese/Korean, and can be deployed server-side for high accuracy or client-side for lightweight operation. This repository provides the client-side lightweight form of the HaS (Hide and Seek) solution: deployable artifacts (weight shards, tokenizer, and inference library) based on Qwen3 0.6B compiled by MLC, which can participate locally/in the browser/on mobile in HaS pipeline subtasks such as NER/HIDE/PAIR/SEEK. The model uses the ChatML conversation template and FP16 precision (q0f16), with a 4096 context window.
18
+
19
+ HaS Overview:
20
+ - Objective: Identify and anonymize sensitive entities such as names, addresses, and phone numbers, while maximizing preservation of original structure and semantic coherence.
21
+ - Languages: Supports 22 languages including Chinese, English, French, Japanese, and Korean.
22
+ - Model forms: Client-side 0.6B/1B aiming for low latency (this repo is the 0.6B client-side form).
23
+
24
+ For more on the pipeline, tag standards, and prompt templates, see “HaS Model Functions & Usage (Summary)” below.
25
 
26
  ---
27
 
28
  ## Directory Structure
29
 
30
+ `mlc-chat-config.json`: MLC chat model config (template, default sampling values, context size, etc.)
31
+ `tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, `added_tokens.json`: Tokenizer-related files
32
+ `ndarray-cache.json`: Weight shard index and verification info
33
+ `params_shard_*.bin`: Weight shards (FP16)
34
+ `configuration.json`: General metadata (no manual changes needed)
35
 
36
  ---
37
 
38
  ## Model Specs
39
 
40
  - Model type: Qwen3 (`model_type: qwen3`)
41
+ - Approx parameter size: 0.6B (total weights ~1.19 GB FP16)
42
+ - Precision & quantization: FP16 (`quantization: q0f16`, BitsPerParam = 16)
43
  - Architecture:
44
  - Layers `num_hidden_layers`: 28
45
  - Hidden size `hidden_size`: 1024
46
  - Intermediate size `intermediate_size`: 3072 (SwiGLU/SiLU activation)
47
+ - Attention heads `num_attention_heads`: 16, KV heads `num_key_value_heads`: 8
48
  - Vocab size `vocab_size`: 151,936
49
  - Context window: 4096 (`context_window_size`)
50
+ - Conversation template: ChatML (`conv_template: chatml`)
51
 
52
+ > Note: `tokenizer_config.json` may set `model_max_length` greater than the runtime window, but actual inference follows `context_window_size` in `mlc-chat-config.json`.
53
 
54
  ---
55
 
56
  ## Quick Start (Local Inference)
57
 
58
+ Below uses the MLC LLM CLI, assuming you are at the root of this repository.
59
 
60
  ### 1) Install Dependencies
61
 
62
+ - Python 3.10+ is recommended. Install MLC LLM and runtimes:
63
 
64
  ```bash
65
  pip install -U mlc-llm mlc-ai
66
  # For latest features, you can try: pip install --pre -U mlc-llm-nightly mlc-ai-nightly
67
  ```
68
 
69
+ > Install the appropriate backend (Metal/CUDA/Vulkan/CPU) for your machine following MLC’s official documentation. The `model.so` included here targets macOS/Metal.
70
 
71
+ ### 2) Chat (CLI)
72
 
73
  ```bash
74
  mlc_llm chat --model resolve/main
75
+ # On Apple Silicon devices, to explicitly specify:
76
  # mlc_llm chat --model resolve/main --device metal
77
  ```
78
 
79
+ Enter the interactive interface and type directly in Chinese or English to chat.
80
 
81
+ ### 3) Start Local Server (Optional)
82
 
83
  ```bash
84
  mlc_llm serve --model resolve/main --host 127.0.0.1 --port 8000
85
  ```
86
 
87
+ Refer to the documentation for your installed `mlc-llm` version for specific API details and optional parameters.
88
 
89
  ---
90
 
91
  ## ChatML Prompt Format
92
 
93
+ This model uses ChatML; conversations consist of `system`/`user`/`assistant` roles and delimiters. If you construct raw prompts directly (bypassing chat frontends), refer to:
94
 
95
  ```
96
  <|im_start|>system
 
100
  <|im_start|>assistant
101
  ```
102
 
103
+ - Messages start with `<|im_start|>role`, end with `<|im_end|>`, followed by a newline.
104
+ - Generation continues from the last `assistant` line and stops at `<|im_end|>` or stop tokens.
105
+ - You can adjust the default `system_message`, stop tokens, and sampling parameters in `resolve/main/mlc-chat-config.json` (back up before editing).
106
 
107
  ---
108
 
109
+ ## Resources & Performance Tips
110
 
111
+ - Weights are ~1.2 GB (FP16); runtime also requires KV cache and temporary buffer memory.
112
+ - On Apple Silicon, at least 8 GB unified memory is recommended. If OOM occurs:
113
  - Reduce `context_window_size` or input length
114
  - Lower `max_new_tokens` / sampling temperature
115
+ - Disable parallel sessions or reduce batch size
116
 
117
  ---
118
 
119
+ ## Processing Pipeline
120
 
121
+ The end-to-end pipeline consists of multiple subtasks:
 
122
 
123
+ - NER: Extract specified entity categories and contents from input text.
124
+ - HIDE: Replace sensitive entities with semantic isomorphic tags to achieve anonymization.
125
+ - SPLIT: Split composite tags to ensure one-to-one correspondence.
126
+ - PAIR: Build the mapping between “tag ↔ original entity”.
127
+ - SEEK: Restore tags back to the original text based on the mapping.
128
+
129
+ ### Typical Applications
130
+
131
+ - Cross-border data compliance circulation: Provide compliant anonymization for outbound training data.
132
+ - Real-time conversation privacy protection: Dynamically remove names, addresses, keys, etc., in chatbots.
133
+ - Data cleaning and proactive protection: Automated privacy cleaning for multilingual business data.
134
+
135
+ ### Context Length (By Task)
136
+
137
+ Server-side HaS family typically supports up to 128K context for NER/HIDE/PAIR/SEEK. Client-side lightweight models follow `context_window_size` in this repo’s `mlc-chat-config.json` (4096).
138
+
139
+ ### Privacy Type Specification
140
+
141
+ Three ways to specify types via the `Specified types` field in prompts:
142
+
143
+ 1) All types:
144
+ ```
145
+ Specified types: all
146
+ ```
147
+ 2) Specific types:
148
+ ```
149
+ Specified types: Type1,Type2,...
150
+ ```
151
+ 3) Emphasized types (in addition to “all”, force replacement):
152
+ ```
153
+ Specified types: all including Type1,Type2,...
154
+ ```
155
+
156
+ ### Semantic Isomorphic Tags (HaS 4.0)
157
+
158
+ - Tag format: `<EntityType[Index].Category.Attribute>`
159
+ - The same index within a type refers to the same entity; indices across types are unrelated.
160
+ - Category and attribute use the same language as the “entity type”; within the same input they must be coherent and consistent.
161
+ - Entities can be hierarchical, composite, and feature-based; different features determine category and attribute values.
162
+
163
+ Replacement principles (excerpt):
164
+
165
+ - Preserve semantic completeness and use longest match; process only specified types with consistent standards; aside from entity replacement, keep other text (including punctuation/spaces/fullwidth/halfwidth) intact.
166
+ - If the input contains angle-bracketed text that is not a standardized tag, keep it as normal text.
167
+ - Pronouns/generic titles are not replaced by default; if part of a complete expression, replace along with the whole.
168
+ - For hierarchy/inclusion conflicts, prioritize the specified types set and NER granularity; reuse the same index as needed to maintain coreference consistency.
169
+
170
+ ### Prompt Templates (Dialog Form; strictly follow the format)
171
+
172
+ Named Entity Recognition (NER)
173
+ ```
174
+ [
175
+ {
176
+ "conversations": [
177
+ {
178
+ "from": "human",
179
+ "value": "Recognize the following entity types in the text.\nSpecified types:[\"Type1\",\"Type2\",...\"]\n<text>{content}</text>"
180
+ },
181
+ { "from": "gpt", "value": "{NER result}" }
182
+ ]
183
+ }
184
+ ]
185
+ ```
186
+
187
+ Privacy Anonymization (HIDE)
188
+ ```
189
+ [
190
+ {
191
+ "conversations": [
192
+ { "from": "human", "value": "Recognize the following entity types in the text.\nSpecified types:[\"Type1,Type2,...\"]\n<text>{content}</text>" },
193
+ { "from": "gpt", "value": "{NER result}" },
194
+ { "from": "human", "value": "Replace the above-mentioned entity types in the text." },
195
+ { "from": "gpt", "value": "{Hide result}" }
196
+ ]
197
+ }
198
+ ]
199
+ ```
200
+
201
+ Model Splitting (SPLIT)
202
+ ```
203
+ [
204
+ {
205
+ "conversations": [
206
+ { "from": "human", "value": "Split each composite anonymized key into atomic keys.\nComposite mapping:\n{\"tag_id_1tag_id_2\":[\"entity_1entity_2\"]}" },
207
+ { "from": "gpt", "value": "{\"tag_id_1\":[\"entity_1\"],\"tag_id_2\":[\"entity2\"]}" }
208
+ ]
209
+ }
210
+ ]
211
+ ```
212
 
213
+ Entity Pairing (PAIR)
214
+ ```
215
+ [
216
+ {
217
+ "conversations": [
218
+ { "from": "human", "value": "<original>{content}</original>\n<anonymized>{Hide result}</anonymized>\nExtract the mapping from anonymized entities to original entities." },
219
+ { "from": "gpt", "value": "{Pair result}" }
220
+ ]
221
+ }
222
+ ]
223
+ ```
224
 
225
+ Entity Restoration (SEEK)
226
+ ```
227
+ [
228
+ {
229
+ "conversations": [
230
+ { "from": "human", "value": "The mapping from anonymized entities to original entities:\n{Pair result}\nRestore the original text based on the above mapping:\n{Deepseek result}" },
231
+ { "from": "gpt", "value": "{Seek result}" }
232
+ ]
233
+ }
234
+ ]
235
+ ```
236
+
237
+ Anonymization with History (Leverage historical alignment consistency)
238
+ ```
239
+ {
240
+ "conversations": [
241
+ {
242
+ "from": "human",
243
+ "value": "Recognize the following entity types in the text.\nSpecified types:[\"Type1\",\"Type2\",…]\n<text>{content}</text>"
244
+ },
245
+ { "from": "gpt", "value": "{NER result}" },
246
+ {
247
+ "from": "human",
248
+ "value": "Replace the above-mentioned entity types in the text according to the existing mapping pairs:{pair_history}"
249
+ },
250
+ { "from": "gpt", "value": "{hide}" }
251
+ ]
252
+ }
253
+ ```
254
+
255
+ > Newlines, punctuation, and field names in the templates must be strictly followed. Do not change them casually (only substitute `{content}` and the type list).
256
+
257
+ ### Virtual Tag Inference Engine (System Prompt)
258
+
259
+ To prevent the chat model from ignoring tag semantics, declare at the System level:
260
+
261
+ - Tag format: `<EntityType[ID].Category.Attribute>`; the same ID within a type refers to the same entity, while IDs across types are unrelated.
262
+ - Core principles: Tag placeholders contain no real names; once provided, mappings persist within the session; response priority follows “original → original + mapping → necessary refusal/insufficient information”; text transformation tasks should not refuse due to missing mappings; strictly avoid fabricating precise numbers or non-public information.
263
+ - Refusal categories: Identity-based refusal (requires real identity but no mapping), and insufficient-information refusal (mapping exists but still not enough info).
264
+ - Task guidance: Q&A, translation, polishing, summarization, sentiment/classification, etc., should rely only on “original text + existing mappings + necessary public common knowledge”.
265
+
266
+ ### Diff Algorithm & Model Fallback
267
+
268
+ For pairing between “entity ↔ tag”, a Diff-like text difference can be used for fast alignment, greatly shortening pairing and restoration latency. Introduce self-check and model fallback for edge cases: if self-check detects incorrect pairing, fall back to model-based pairing/restoration to ensure correctness.
269
+
270
+ ### HaS FAQ
271
+
272
+ - Must I strictly follow the template’s newlines/indentation/punctuation?
273
+ - Yes. Only `content` and the type list may be substituted; the format must not be changed. A template self-check tool will be provided later.
274
+ - In batch data, can the same entity word be replaced with the same target across entries?
275
+ - Yes. Add the specified mapping “entity → target” to the template’s corresponding position and provide it to HaS 3.0/4.0 to achieve consistent replacement across multiple inputs.
276
 
277
  ---
278
 
279
+ ## License & Sources
280
 
281
+ - This repository contains only compiled artifacts and configs for inference; it does not include training code.
282
+ - Model weights and their licenses follow the upstream model (Qwen3) and your authorization terms. For commercial use, read and comply with upstream licenses.
283
  - Use and redistribution of MLC/TVM must follow their respective open-source licenses.
284
 
285
+ ---