Create README.md

27737c8 verified 5 months ago

3.56 kB

	---
	license: mit
	base_model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
	tags:
	- text-generation
	- lmul
	- research
	- experimental
	- qwen3
	---

	# L-Mul Optimized: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

	This is a modified version of DeepSeek AI's [DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) model. The modification consists of replacing the standard attention mechanism with one that uses a custom, approximate matrix multiplication algorithm termed "L-Mul".

	This work was performed as part of a research project to evaluate the performance and accuracy trade-offs of algorithmic substitutions in transformer architectures.

	This model is intended strictly for educational and scientific purposes.

	## Model Description

	The core architecture of `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` is preserved. However, the standard `Qwen3Attention` modules have been dynamically replaced with a custom version that utilizes the `l_mul_attention` function for its core computations. This function is defined in the `lmul.py` file included in this repository.

	- Base Model: [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B)
	- Modification: Replacement of standard attention with L-Mul approximate attention.
	- Primary Use-Case: Research and educational analysis of algorithmic impact on LLMs.

	## How to Get Started

	To use this model, you must use the `trust_remote_code=True` flag when loading it. This is required to execute the custom `lmul.py` file that defines the new attention mechanism.

	You can load the model directly from this repository using the `transformers` library:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Define the repository ID for the specific model
	repo_id = "Peacemann/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B_LMUL" # Replace with the correct repo ID if different

	# Load the tokenizer and model, trusting the remote code to load lmul.py
	tokenizer = AutoTokenizer.from_pretrained(repo_id)
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	# Example usage
	prompt = "The L-Mul algorithm is an experimental method for..."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=50)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	For high-throughput inference, you can use `vLLM`:

	```python
	from vllm import LLM

	repo_id = "Peacemann/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B_LMUL" # Replace with the correct repo ID
	llm = LLM(model=repo_id, trust_remote_code=True)
	```

	## Intended Uses & Limitations

	This model is intended for researchers and students exploring the internal workings of LLMs. It is a tool for visualizing and analyzing the effects of fundamental algorithmic changes.

	This model is NOT intended for any commercial or production application.

	The modification is experimental. The impact on the model's performance, safety alignment, accuracy, and potential for generating biased or harmful content is unknown and untested. It inherits all limitations and biases of the original `DeepSeek-R1-0528-Qwen3-8B` model, and its behavior may be altered in unpredictable ways.

	## Licensing Information

	The use of this model is subject to the original MIT License. By using this model, you agree to the terms outlined in the license. The license can be found on the base model's Hugging Face page.