mmnga
/

Mixtral-Fusion-4x7B-Instruct-v0.1

Text Generation

Mixture of Experts

text-generation-inference

Model card Files Files and versions

mmnga commited on Dec 19, 2023

Commit

9c662bb

·

1 Parent(s): 273e426

Update README.md

Files changed (1) hide show

README.md +10 -24

README.md CHANGED Viewed

@@ -13,7 +13,8 @@ This model is an experimental model created by merging [mistralai/Mixtral-8x7B-I
 # How we merged experts
 We simply take the average of every two experts.weight.
-The same goes for gate.weight.
 # How To Convert
 use colab cpu-high-memory.
@@ -34,26 +35,11 @@ model_name_or_path = "mmnga/Mixtral-Fusion-4x7B-Instruct-v0.1"
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
 model = MixtralForCausalLM.from_pretrained(model_name_or_path, load_in_8bit=True)
-# set num_experts_per_tok 1 or 2 ?
-model.config.num_experts_per_tok = 2
-# message
-messages = [
-    {"role": "user", "content": "Tell me what's for dinner tonight."},
-]
-with torch.no_grad():
-    token_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
-    output_ids = model.generate(
-        token_ids.to(model.device),
-        temperature=0.5,
-        do_sample=True,
-        top_p=0.95,
-        top_k=40,
-        max_new_tokens=128,
-        repetition_penalty=1.5
-    )
-output = tokenizer.decode(output_ids[0][token_ids.size(1) :])
-print(output)
-~~~

 # How we merged experts
 We simply take the average of every two experts.weight.
+The same goes for gate.weight.
+**Unfortunately, this model has a large hallucination. Look extraction version. -> [mmnga/Mixtral-Extraction-4x7B-Instruct-v0.1](https://huggingface.co/mmnga/Mixtral-Extraction-4x7B-Instruct-v0.1)**
 # How To Convert
 use colab cpu-high-memory.
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
 model = MixtralForCausalLM.from_pretrained(model_name_or_path, load_in_8bit=True)
+text = "Tell me what's for dinner tonight. "
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=128)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+~~~