hxa079 RWKV-Transformer Hybrid series
Collection
HXA079 family of hybrid models, combining RWKV recurrent architectures with Transformer-based attention.
Designed for efficient long-context.
•
8 items
•
Updated
Acknowledgment
This project received computational resources and technical support from Recursal.AI. I'm deeply grateful for their support!
This is an experimental model that converts most of the Transformer LLM to RWKV linear attention based on the RADLADS method.
RWKV Layers: Interleaved RWKV blocks based on the hxa079 design
Transformer Layers: Placed at strategic depths to enhance long-context performance
Hybrid Design:
LoRA Customization:
RoPE Usage: Enabled (use_rope: true), aligning positional encoding with RWKV blocks
Performance evaluation is ongoing. The model shows promising results in:
need install flash-linear-attention
pip install flash-linear-attention
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "OpenMOSE/RWKV-Seed-OSS-36B-hxa079"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = """There is a very famous song that I recall by the singer's surname as Astley.
I can't remember the name or the youtube URL that people use to link as an example url.
What's song name?"""
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [
output_ids[len(input_ids) :]
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
OpenMOSE - 2025
Base model
ByteDance-Seed/Seed-OSS-36B-Instruct