huihui-ai commited on
Commit
2be0ccf
·
verified ·
1 Parent(s): 9dce196

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +236 -3
README.md CHANGED
@@ -1,3 +1,236 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2
4
+ license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - vllm
9
+ - unsloth
10
+ - abliterated
11
+ - uncensored
12
+ ---
13
+
14
+ # huihui-ai/Huihui-gpt-oss-20b-mxfp4-abliterated-v2
15
+
16
+
17
+ This is a mxfp4 version of [huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2](https://huggingface.co/huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2)
18
+
19
+ ## QAT
20
+
21
+ Reference [OpenAI GPT-OSS Quantization Aware Training (QAT) & Quantized Deployment](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/76e8ce21bf9ce4e0510fea96c998aaee7cfeaf7c/examples/gpt-oss/README.md)
22
+
23
+ ```
24
+ pip install nvidia-modelopt[all]
25
+
26
+ git clone https://github.com/NVIDIA/TensorRT-Model-Optimizer
27
+ cd TensorRT-Model-Optimizer/examples/gpt-oss
28
+ hf download huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2 --local-dir ./huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2 --exclude "GGUF/*"
29
+
30
+ python convert_oai_mxfp4_weight_only.py --model_path huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2/ --output_path huihui-ai/Huihui-gpt-oss-20b-mxfp4-abliterated-v2/
31
+ ```
32
+
33
+ ## Usage
34
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
35
+
36
+
37
+ ```python
38
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
39
+ import torch
40
+ import os
41
+ import signal
42
+ import random
43
+ import numpy as np
44
+ import time
45
+ from collections import Counter
46
+
47
+ cpu_count = os.cpu_count()
48
+ print(f"Number of CPU cores in the system: {cpu_count}")
49
+ half_cpu_count = cpu_count // 2
50
+ os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
51
+ os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
52
+ torch.set_num_threads(half_cpu_count)
53
+
54
+ print(f"PyTorch threads: {torch.get_num_threads()}")
55
+ print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
56
+ print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
57
+
58
+ # Load the model and tokenizer
59
+ NEW_MODEL_ID = "huihui-ai/Huihui-gpt-oss-20b-mxfp4-abliterated-v2"
60
+ print(f"Load Model {NEW_MODEL_ID} ... ")
61
+
62
+ model = AutoModelForCausalLM.from_pretrained(
63
+ NEW_MODEL_ID,
64
+ device_map="auto",
65
+ torch_dtype="auto,
66
+ )
67
+ #print(model)
68
+ #print(model.config)
69
+
70
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
71
+
72
+ messages = []
73
+ skip_prompt=False
74
+ skip_special_tokens=False
75
+ do_sample = True
76
+
77
+ class CustomTextStreamer(TextStreamer):
78
+ def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
79
+ super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
80
+ self.generated_text = ""
81
+ self.stop_flag = False
82
+ self.init_time = time.time() # Record initialization time
83
+ self.end_time = None # To store end time
84
+ self.first_token_time = None # To store first token generation time
85
+ self.token_count = 0 # To track total tokens
86
+
87
+ def on_finalized_text(self, text: str, stream_end: bool = False):
88
+ if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
89
+ self.first_token_time = time.time()
90
+ self.generated_text += text
91
+ # Count tokens in the generated text
92
+ tokens = self.tokenizer.encode(text, add_special_tokens=False)
93
+ self.token_count += len(tokens)
94
+ print(text, end="", flush=True)
95
+ if stream_end:
96
+ self.end_time = time.time() # Record end time when streaming ends
97
+ if self.stop_flag:
98
+ raise StopIteration
99
+
100
+ def stop_generation(self):
101
+ self.stop_flag = True
102
+ self.end_time = time.time() # Record end time when generation is stopped
103
+
104
+ def get_metrics(self):
105
+ """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
106
+ if self.end_time is None:
107
+ self.end_time = time.time() # Set end time if not already set
108
+ total_time = self.end_time - self.init_time # Total time from init to end
109
+ tokens_per_second = self.token_count / total_time if total_time > 0 else 0
110
+ first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
111
+ metrics = {
112
+ "init_time": self.init_time,
113
+ "first_token_time": self.first_token_time,
114
+ "first_token_latency": first_token_latency,
115
+ "end_time": self.end_time,
116
+ "total_time": total_time, # Total time in seconds
117
+ "total_tokens": self.token_count,
118
+ "tokens_per_second": tokens_per_second
119
+ }
120
+ return metrics
121
+
122
+ def generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, do_sample, max_new_tokens):
123
+ input_ids = tokenizer.apply_chat_template(
124
+ messages,
125
+ add_generation_prompt=True,
126
+ return_tensors="pt",
127
+ return_dict=True,
128
+ ).to(model.device)
129
+
130
+ streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
131
+
132
+ def signal_handler(sig, frame):
133
+ streamer.stop_generation()
134
+ print("\n[Generation stopped by user with Ctrl+C]")
135
+
136
+ signal.signal(signal.SIGINT, signal_handler)
137
+
138
+ generate_kwargs = {}
139
+ if do_sample:
140
+ generate_kwargs = {
141
+ "do_sample": do_sample,
142
+ "max_length": max_new_tokens,
143
+ "temperature": 0.7,
144
+ "top_k": 20,
145
+ "top_p": 0.8,
146
+ "repetition_penalty": 1.2,
147
+ "no_repeat_ngram_size": 2
148
+ }
149
+ else:
150
+ generate_kwargs = {
151
+ "do_sample": do_sample,
152
+ "max_length": max_new_tokens,
153
+ "repetition_penalty": 1.2,
154
+ "no_repeat_ngram_size": 2
155
+ }
156
+
157
+
158
+ print("Response: ", end="", flush=True)
159
+ try:
160
+ generated_ids = model.generate(
161
+ **input_ids,
162
+ streamer=streamer,
163
+ **generate_kwargs
164
+ )
165
+ del generated_ids
166
+ except StopIteration:
167
+ print("\n[Stopped by user]")
168
+
169
+ del input_ids
170
+ torch.cuda.empty_cache()
171
+ signal.signal(signal.SIGINT, signal.SIG_DFL)
172
+
173
+ return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
174
+
175
+ while True:
176
+ print(f"skip_prompt: {skip_prompt}")
177
+ print(f"skip_special_tokens: {skip_special_tokens}")
178
+ print(f"do_sample: {do_sample}")
179
+
180
+ user_input = input("User: ").strip()
181
+ if user_input.lower() == "/exit":
182
+ print("Exiting chat.")
183
+ break
184
+ if user_input.lower() == "/clear":
185
+ messages = []
186
+ print("Chat history cleared. Starting a new conversation.")
187
+ continue
188
+ if user_input.lower() == "/skip_prompt":
189
+ skip_prompt = not skip_prompt
190
+ continue
191
+ if user_input.lower() == "/skip_special_tokens":
192
+ skip_special_tokens = not skip_special_tokens
193
+ continue
194
+ if user_input.lower() == "/do_sample":
195
+ do_sample = not do_sample
196
+ continue
197
+ if not user_input:
198
+ print("Input cannot be empty. Please enter something.")
199
+ continue
200
+
201
+
202
+ messages.append({"role": "user", "content": user_input})
203
+ response, stop_flag, metrics = generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, do_sample, 40960)
204
+ print("\n\nMetrics:")
205
+ for key, value in metrics.items():
206
+ print(f" {key}: {value}")
207
+
208
+ print("", flush=True)
209
+ if stop_flag:
210
+ continue
211
+ messages.append({"role": "assistant", "content": response})
212
+ ```
213
+
214
+ ## Usage Warnings
215
+
216
+
217
+ - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
218
+
219
+ - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
220
+
221
+ - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
222
+
223
+ - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
224
+
225
+ - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
226
+
227
+ - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
228
+
229
+
230
+ ### Donation
231
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
232
+ - bitcoin:
233
+ ```
234
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
235
+ ```
236
+ - Support our work on Ko-fi (https://ko-fi.com/huihuiai)!