Update README.md
Browse files
README.md
CHANGED
|
@@ -42,12 +42,7 @@ On an A40 (plenty of VRAM), everything except the model identical, the time take
|
|
| 42 |
- 9_2 => 42.8s
|
| 43 |
- 9_6 => 48.2s
|
| 44 |
|
| 45 |
-
for comparison
|
| 46 |
-
- bfloat16 (default) =>
|
| 47 |
-
- fp8_e4m3fn =>
|
| 48 |
-
- fp8_e5m2 =>
|
| 49 |
-
|
| 50 |
-
|
| 51 |
|
| 52 |
## How is this optimised?
|
| 53 |
|
|
|
|
| 42 |
- 9_2 => 42.8s
|
| 43 |
- 9_6 => 48.2s
|
| 44 |
|
| 45 |
+
for comparison, the unquantised models take about 27s.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
## How is this optimised?
|
| 48 |
|