magiccodingman's picture
initial upload
f5a619b verified

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes

model size params backend ngl test t/s
qwen3moe 30B.A3B Q8_0 30.25 GiB 30.53 B CUDA 35 pp8 95.03 ± 2.99
qwen3moe 30B.A3B Q8_0 30.25 GiB 30.53 B CUDA 35 tg128 31.61 ± 0.21

build: 92bb442ad (7040)