xuanwulab
/

HaS_4.0_0.6B_q0f16-MLC

@@ -14,76 +14,83 @@ tags:
 # HaS_4.0_0.6B_q0f16-MLC
-This repository contains deployable artifacts (weight shards, tokenizer, and inference library) for a Qwen3 0.6B chat model compiled via MLC. The model uses the ChatML dialogue template and FP16 precision (q0f16), with a 4096-token context window.
 ---
 ## Directory Structure
-`mlc-chat-config.json`: MLC chat model configuration (template, default sampling values, context size, etc.)
-`tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, `added_tokens.json`: tokenizer-related files
-`ndarray-cache.json`: weight shard index and checksum info
-`params_shard_*.bin`: weight shards (FP16)
-`configuration.json`: general metadata (no manual changes needed)
 ---
 ## Model Specs
 - Model type: Qwen3 (`model_type: qwen3`)
-- Approximate parameter scale: 0.6B (total weights ~1.19 GB FP16)
-- Precision and quantization: FP16 (`quantization: q0f16`, BitsPerParam = 16)
 - Architecture:
   - Layers `num_hidden_layers`: 28
   - Hidden size `hidden_size`: 1024
   - Intermediate size `intermediate_size`: 3072 (SwiGLU/SiLU activation)
-  - Attention heads `num_attention_heads`: 16; KV heads `num_key_value_heads`: 8
   - Vocab size `vocab_size`: 151,936
 - Context window: 4096 (`context_window_size`)
-- Dialogue template: ChatML (`conv_template: chatml`)
-Note: `model_max_length` in `tokenizer_config.json` may be larger than the runtime window, but actual inference uses `context_window_size` from `mlc-chat-config.json`.
 ---
 ## Quick Start (Local Inference)
-Using the MLC LLM CLI, assuming you are at the repository root.
 ### 1) Install Dependencies
-- Python 3.10+ recommended. Install MLC LLM and related runtimes:
 ```bash
 pip install -U mlc-llm mlc-ai
 # For latest features, you can try: pip install --pre -U mlc-llm-nightly mlc-ai-nightly
 ```
-Use the official MLC docs and your hardware to install the appropriate backend (Metal/CUDA/Vulkan/CPU). The `model.so` included in this repository targets macOS/Metal.
-### 2) Chat Directly (CLI)
 ```bash
 mlc_llm chat --model resolve/main
-# On Apple Silicon, if you need to specify explicitly:
 # mlc_llm chat --model resolve/main --device metal
 ```
-Once in the interactive shell, type in Chinese or English to chat.
-### 3) Start a Local Service (Optional)
 ```bash
 mlc_llm serve --model resolve/main --host 127.0.0.1 --port 8000
 ```
-Refer to the documentation of your installed `mlc-llm` version for specific APIs and options.
 ---
 ## ChatML Prompt Format
-This model uses ChatML. Conversations are composed of `system`/`user`/`assistant` roles with separators. If you construct raw prompts directly (bypassing a chat frontend), you can use:
 ```
 <|im_start|>system
@@ -93,46 +100,186 @@ Hello, please introduce yourself.<|im_end|>
 <|im_start|>assistant
 ```
-- Each message starts with `<|im_start|>role` and ends with `<|im_end|>` followed by a newline.
-- Generation continues from the final `assistant` line and stops at `<|im_end|>` or stop tokens.
-- You can adjust default `system_message`, stop strings, and sampling parameters in `resolve/main/mlc-chat-config.json` (back up before modifying).
 ---
-## Resources and Performance Tips
-- Weight size is about 1.2 GB (FP16). Runtime also needs KV cache and temporary buffers.
-- On Apple Silicon, at least 8 GB unified memory is recommended. If you encounter OOM:
   - Reduce `context_window_size` or input length
   - Lower `max_new_tokens` / sampling temperature
-  - Disable concurrent sessions or reduce batch size
 ---
-## Compatibility and Porting
-- Current artifact: `model.so` targets macOS arm64 (Metal).
-- Other platforms (Linux/CUDA, Windows/Vulkan, Android, WebGPU, etc.) require recompiling the same upstream model with the MLC toolchain to produce backend-specific artifacts (`model.so/.wasm`, weight shards, and configuration).
----
-## Troubleshooting (FAQ)
-- Unable to load `model.so` / architecture mismatch:
-  - Ensure you are on macOS with Apple Silicon and using Python/dependencies that match the platform.
-- CLI cannot find the model:
-  - Make sure `--model` points to the directory containing `mlc-chat-config.json` (use `resolve/main` in this repository).
-- Slow initial response:
-  - The first load maps multiple shard files and initializes the graph; this is expected.
-- Unstable output quality:
-  - Tune temperature (e.g., 0.7), `top_p` (e.g., 0.9), or reduce instruction complexity; a 0.6B model is more suitable for lightweight tasks.
 ---
-## License and Source
-- This repository only contains compiled inference artifacts and configuration; it does not include training code.
-- Model weights and licensing should follow the upstream Qwen3 model and your authorized terms. For commercial use, read and comply with the upstream licenses.
 - Use and redistribution of MLC/TVM must follow their respective open-source licenses.
----

 # HaS_4.0_0.6B_q0f16-MLC
+HaS is a data de-identification model developed by Tencent Xuanwu Lab. Its core goal is to identify sensitive entities in text (such as names, addresses, and phone numbers) and securely replace them with standardized anonymization tags, preserving privacy while maintaining the original structure and semantic coherence. HaS supports 22 languages including Chinese/English/French/Japanese/Korean, and can be deployed server-side for high accuracy or client-side for lightweight operation. This repository provides the client-side lightweight form of the HaS (Hide and Seek) solution: deployable artifacts (weight shards, tokenizer, and inference library) based on Qwen3 0.6B compiled by MLC, which can participate locally/in the browser/on mobile in HaS pipeline subtasks such as NER/HIDE/PAIR/SEEK. The model uses the ChatML conversation template and FP16 precision (q0f16), with a 4096 context window.
+HaS Overview:
+- Objective: Identify and anonymize sensitive entities such as names, addresses, and phone numbers, while maximizing preservation of original structure and semantic coherence.
+- Languages: Supports 22 languages including Chinese, English, French, Japanese, and Korean.
+- Model forms: Client-side 0.6B/1B aiming for low latency (this repo is the 0.6B client-side form).
+For more on the pipeline, tag standards, and prompt templates, see “HaS Model Functions & Usage (Summary)” below.
 ---
 ## Directory Structure
+`mlc-chat-config.json`: MLC chat model config (template, default sampling values, context size, etc.)
+`tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, `added_tokens.json`: Tokenizer-related files
+`ndarray-cache.json`: Weight shard index and verification info
+`params_shard_*.bin`: Weight shards (FP16)
+`configuration.json`: General metadata (no manual changes needed)
 ---
 ## Model Specs
 - Model type: Qwen3 (`model_type: qwen3`)
+- Approx parameter size: 0.6B (total weights ~1.19 GB FP16)
+- Precision & quantization: FP16 (`quantization: q0f16`, BitsPerParam = 16)
 - Architecture:
   - Layers `num_hidden_layers`: 28
   - Hidden size `hidden_size`: 1024
   - Intermediate size `intermediate_size`: 3072 (SwiGLU/SiLU activation)
+  - Attention heads `num_attention_heads`: 16, KV heads `num_key_value_heads`: 8
   - Vocab size `vocab_size`: 151,936
 - Context window: 4096 (`context_window_size`)
+- Conversation template: ChatML (`conv_template: chatml`)
+> Note: `tokenizer_config.json` may set `model_max_length` greater than the runtime window, but actual inference follows `context_window_size` in `mlc-chat-config.json`.
 ---
 ## Quick Start (Local Inference)
+Below uses the MLC LLM CLI, assuming you are at the root of this repository.
 ### 1) Install Dependencies
+- Python 3.10+ is recommended. Install MLC LLM and runtimes:
 ```bash
 pip install -U mlc-llm mlc-ai
 # For latest features, you can try: pip install --pre -U mlc-llm-nightly mlc-ai-nightly
 ```
+> Install the appropriate backend (Metal/CUDA/Vulkan/CPU) for your machine following MLC’s official documentation. The `model.so` included here targets macOS/Metal.
+### 2) Chat (CLI)
 ```bash
 mlc_llm chat --model resolve/main
+# On Apple Silicon devices, to explicitly specify:
 # mlc_llm chat --model resolve/main --device metal
 ```
+Enter the interactive interface and type directly in Chinese or English to chat.
+### 3) Start Local Server (Optional)
 ```bash
 mlc_llm serve --model resolve/main --host 127.0.0.1 --port 8000
 ```
+Refer to the documentation for your installed `mlc-llm` version for specific API details and optional parameters.
 ---
 ## ChatML Prompt Format
+This model uses ChatML; conversations consist of `system`/`user`/`assistant` roles and delimiters. If you construct raw prompts directly (bypassing chat frontends), refer to:
 ```
 <|im_start|>system
 <|im_start|>assistant
 ```
+- Messages start with `<|im_start|>role`, end with `<|im_end|>`, followed by a newline.
+- Generation continues from the last `assistant` line and stops at `<|im_end|>` or stop tokens.
+- You can adjust the default `system_message`, stop tokens, and sampling parameters in `resolve/main/mlc-chat-config.json` (back up before editing).
 ---
+## Resources & Performance Tips
+- Weights are ~1.2 GB (FP16); runtime also requires KV cache and temporary buffer memory.
+- On Apple Silicon, at least 8 GB unified memory is recommended. If OOM occurs:
   - Reduce `context_window_size` or input length
   - Lower `max_new_tokens` / sampling temperature
+  - Disable parallel sessions or reduce batch size
 ---
+## Processing Pipeline
+The end-to-end pipeline consists of multiple subtasks:
+- NER: Extract specified entity categories and contents from input text.
+- HIDE: Replace sensitive entities with semantic isomorphic tags to achieve anonymization.
+- SPLIT: Split composite tags to ensure one-to-one correspondence.
+- PAIR: Build the mapping between “tag ↔ original entity”.
+- SEEK: Restore tags back to the original text based on the mapping.
+### Typical Applications
+- Cross-border data compliance circulation: Provide compliant anonymization for outbound training data.
+- Real-time conversation privacy protection: Dynamically remove names, addresses, keys, etc., in chatbots.
+- Data cleaning and proactive protection: Automated privacy cleaning for multilingual business data.
+### Context Length (By Task)
+Server-side HaS family typically supports up to 128K context for NER/HIDE/PAIR/SEEK. Client-side lightweight models follow `context_window_size` in this repo’s `mlc-chat-config.json` (4096).
+### Privacy Type Specification
+Three ways to specify types via the `Specified types` field in prompts:
+1) All types:
+```
+Specified types: all
+```
+2) Specific types:
+```
+Specified types: Type1,Type2,...
+```
+3) Emphasized types (in addition to “all”, force replacement):
+```
+Specified types: all including Type1,Type2,...
+```
+### Semantic Isomorphic Tags (HaS 4.0)
+- Tag format: `<EntityType[Index].Category.Attribute>`
+- The same index within a type refers to the same entity; indices across types are unrelated.
+- Category and attribute use the same language as the “entity type”; within the same input they must be coherent and consistent.
+- Entities can be hierarchical, composite, and feature-based; different features determine category and attribute values.
+Replacement principles (excerpt):
+- Preserve semantic completeness and use longest match; process only specified types with consistent standards; aside from entity replacement, keep other text (including punctuation/spaces/fullwidth/halfwidth) intact.
+- If the input contains angle-bracketed text that is not a standardized tag, keep it as normal text.
+- Pronouns/generic titles are not replaced by default; if part of a complete expression, replace along with the whole.
+- For hierarchy/inclusion conflicts, prioritize the specified types set and NER granularity; reuse the same index as needed to maintain coreference consistency.
+### Prompt Templates (Dialog Form; strictly follow the format)
+Named Entity Recognition (NER)
+```
+[
+  {
+    "conversations": [
+      {
+        "from": "human",
+        "value": "Recognize the following entity types in the text.\nSpecified types:[\"Type1\",\"Type2\",...\"]\n<text>{content}</text>"
+      },
+      { "from": "gpt", "value": "{NER result}" }
+    ]
+  }
+]
+```
+Privacy Anonymization (HIDE)
+```
+[
+  {
+    "conversations": [
+      { "from": "human", "value": "Recognize the following entity types in the text.\nSpecified types:[\"Type1,Type2,...\"]\n<text>{content}</text>" },
+      { "from": "gpt",   "value": "{NER result}" },
+      { "from": "human", "value": "Replace the above-mentioned entity types in the text." },
+      { "from": "gpt",   "value": "{Hide result}" }
+    ]
+  }
+]
+```
+Model Splitting (SPLIT)
+```
+[
+  {
+    "conversations": [
+      { "from": "human", "value": "Split each composite anonymized key into atomic keys.\nComposite mapping:\n{\"tag_id_1tag_id_2\":[\"entity_1entity_2\"]}" },
+      { "from": "gpt",   "value": "{\"tag_id_1\":[\"entity_1\"],\"tag_id_2\":[\"entity2\"]}" }
+    ]
+  }
+]
+```
+Entity Pairing (PAIR)
+```
+[
+  {
+    "conversations": [
+      { "from": "human", "value": "<original>{content}</original>\n<anonymized>{Hide result}</anonymized>\nExtract the mapping from anonymized entities to original entities." },
+      { "from": "gpt",   "value": "{Pair result}" }
+    ]
+  }
+]
+```
+Entity Restoration (SEEK)
+```
+[
+  {
+    "conversations": [
+      { "from": "human", "value": "The mapping from anonymized entities to original entities:\n{Pair result}\nRestore the original text based on the above mapping:\n{Deepseek result}" },
+      { "from": "gpt",   "value": "{Seek result}" }
+    ]
+  }
+]
+```
+Anonymization with History (Leverage historical alignment consistency)
+```
+{
+  "conversations": [
+    {
+      "from": "human",
+      "value": "Recognize the following entity types in the text.\nSpecified types:[\"Type1\",\"Type2\",…]\n<text>{content}</text>"
+    },
+    { "from": "gpt",   "value": "{NER result}" },
+    {
+      "from": "human",
+      "value": "Replace the above-mentioned entity types in the text according to the existing mapping pairs:{pair_history}"
+    },
+    { "from": "gpt",   "value": "{hide}" }
+  ]
+}
+```
+> Newlines, punctuation, and field names in the templates must be strictly followed. Do not change them casually (only substitute `{content}` and the type list).
+### Virtual Tag Inference Engine (System Prompt)
+To prevent the chat model from ignoring tag semantics, declare at the System level:
+- Tag format: `<EntityType[ID].Category.Attribute>`; the same ID within a type refers to the same entity, while IDs across types are unrelated.
+- Core principles: Tag placeholders contain no real names; once provided, mappings persist within the session; response priority follows “original → original + mapping → necessary refusal/insufficient information”; text transformation tasks should not refuse due to missing mappings; strictly avoid fabricating precise numbers or non-public information.
+- Refusal categories: Identity-based refusal (requires real identity but no mapping), and insufficient-information refusal (mapping exists but still not enough info).
+- Task guidance: Q&A, translation, polishing, summarization, sentiment/classification, etc., should rely only on “original text + existing mappings + necessary public common knowledge”.
+### Diff Algorithm & Model Fallback
+For pairing between “entity ↔ tag”, a Diff-like text difference can be used for fast alignment, greatly shortening pairing and restoration latency. Introduce self-check and model fallback for edge cases: if self-check detects incorrect pairing, fall back to model-based pairing/restoration to ensure correctness.
+### HaS FAQ
+- Must I strictly follow the template’s newlines/indentation/punctuation?
+  - Yes. Only `content` and the type list may be substituted; the format must not be changed. A template self-check tool will be provided later.
+- In batch data, can the same entity word be replaced with the same target across entries?
+  - Yes. Add the specified mapping “entity → target” to the template’s corresponding position and provide it to HaS 3.0/4.0 to achieve consistent replacement across multiple inputs.
 ---
+## License & Sources
+- This repository contains only compiled artifacts and configs for inference; it does not include training code.
+- Model weights and their licenses follow the upstream model (Qwen3) and your authorization terms. For commercial use, read and comply with upstream licenses.
 - Use and redistribution of MLC/TVM must follow their respective open-source licenses.
+---