Run it on non-Nvidia hardware?
#1
by
						
David341
	
							
						- opened
							
					
Hello, it uses triton, which requires an Nvidia GPU. Is there a way to run it on an Apple Silicon mac?
For now a quick work-around would be to use all native PyTorch kernels, but for this one needs to overwrite the default config:
from transformers import AutoModelForCausalLM, AutoTokenizer
from mlstm_kernels.torch.backend_module import mLSTMBackend, mLSTMBackendConfig
device="cpu"
model_name = "NX-AI/xLSTM-7b"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
model.config.step_kernel = "native"
model.config.sequence_kernel = "native_sequence__native"
model.config.chunkwise_kernel = "chunkwise--native_custbw"
config = model.config
for block in model.backbone.blocks:
    block.mlstm_layer.mlstm_backend = mLSTMBackend(mLSTMBackendConfig(
                    chunkwise_kernel=config.chunkwise_kernel,
                    sequence_kernel=config.sequence_kernel,
                    step_kernel=config.step_kernel,
                    mode=config.mode,
                    chunk_size=config.chunk_size,
                    return_last_states=config.return_last_states,
                    autocast_kernel_dtype=config.autocast_kernel_dtype,
                    eps=config.eps,
                    inference_state_dtype=config.inference_state_dtype,))
tok = AutoTokenizer.from_pretrained(model_name)
print(model.generate(tok("Hello", return_tensors="pt")["input_ids"].to(device=device)))
However, this is not tested on Apple Silicon Mac yet.
Thank you, I will let you know the result of my test.
Sorry for the long delay, the code was too dependent on Triton, I ended up spinning a cloud VM.
