Will it perform better on bfloat16?

#14

by madmax0404 - opened 2 days ago

Discussion

madmax0404

2 days ago

Please tell me

ubergarm

2 days ago

•

edited 2 days ago

Probably depends on your GPUs? There are a lot of variables to optimizing both speed and quality. Generally speaking, a smaller quantization will be faster assuming there is a good backend implementation of the kernels for your exact hardware.

Here is what I'm getting on two older CUDA GPUs: https://www.reddit.com/r/LocalLLaMA/comments/1pj9r93/now_40_faster_ik_llamacpp_sm_graph_on_2x_cuda_gpus/

bartowski

2 days ago

this was trained at fp8 so no, bf16 would just be upcasting and therefore run the same

bartowski

2 days ago

(unless you meant speed-performance like @ubergarm seemed to assume)

madmax0404

2 days ago

this was trained at fp8 so no, bf16 would just be upcasting and therefore run the same

thanks bro

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment