Run it on non-Nvidia hardware?

by David341 - opened Dec 12, 2024

Discussion

David341

Dec 12, 2024

Hello, it uses triton, which requires an Nvidia GPU. Is there a way to run it on an Apple Silicon mac?

korbip

Dec 12, 2024

For now a quick work-around would be to use all native PyTorch kernels, but for this one needs to overwrite the default config:

from transformers import AutoModelForCausalLM, AutoTokenizer
from mlstm_kernels.torch.backend_module import mLSTMBackend, mLSTMBackendConfig

device="cpu"
model_name = "NX-AI/xLSTM-7b"

model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
model.config.step_kernel = "native"
model.config.sequence_kernel = "native_sequence__native"
model.config.chunkwise_kernel = "chunkwise--native_custbw"
config = model.config

for block in model.backbone.blocks:
    block.mlstm_layer.mlstm_backend = mLSTMBackend(mLSTMBackendConfig(
                    chunkwise_kernel=config.chunkwise_kernel,
                    sequence_kernel=config.sequence_kernel,
                    step_kernel=config.step_kernel,
                    mode=config.mode,
                    chunk_size=config.chunk_size,
                    return_last_states=config.return_last_states,
                    autocast_kernel_dtype=config.autocast_kernel_dtype,
                    eps=config.eps,
                    inference_state_dtype=config.inference_state_dtype,))

tok = AutoTokenizer.from_pretrained(model_name)

print(model.generate(tok("Hello", return_tensors="pt")["input_ids"].to(device=device)))

However, this is not tested on Apple Silicon Mac yet.

David341

Dec 13, 2024

Thank you, I will let you know the result of my test.

David341

Jan 14

Sorry for the long delay, the code was too dependent on Triton, I ended up spinning a cloud VM.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment