Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -66,10 +66,10 @@ Each separate quant is in a different branch.  See below for instructions on fet | |
| 66 |  | 
| 67 | 
             
            | Branch | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Description |
         | 
| 68 | 
             
            | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
         | 
| 69 | 
            -
            | main | 4 | 128 | False | 4.21 GB |  | 
| 70 | 
            -
            | gptq-4bit-32g-actorder_True | 4 | 32 | True | 4.59 GB |  | 
| 71 | 
            -
            | gptq-4bit-64g-actorder_True | 4 | 64 | True | 4.34 GB |  | 
| 72 | 
            -
            | gptq-4bit-128g-actorder_True | 4 | 128 | True | 4.21 GB |  | 
| 73 | 
             
            | gptq-8bit--1g-actorder_True | 8 | None | True | 7.33 GB | False | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
         | 
| 74 | 
             
            | gptq-8bit-128g-actorder_False | 8 | 128 | False | 7.47 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |
         | 
| 75 |  | 
| @@ -88,6 +88,10 @@ Please make sure you're using the latest version of [text-generation-webui](http | |
| 88 |  | 
| 89 | 
             
            It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
         | 
| 90 |  | 
|  | |
|  | |
|  | |
|  | |
| 91 | 
             
            1. Click the **Model tab**.
         | 
| 92 | 
             
            2. Under **Download custom model or LoRA**, enter `TheBloke/Codegen25-7B-mono-GPTQ`.
         | 
| 93 | 
             
              - To download from a specific branch, enter for example `TheBloke/Codegen25-7B-mono-GPTQ:gptq-4bit-32g-actorder_True`
         | 
| @@ -95,11 +99,12 @@ It is strongly recommended to use the text-generation-webui one-click-installers | |
| 95 | 
             
            3. Click **Download**.
         | 
| 96 | 
             
            4. The model will start downloading. Once it's finished it will say "Done"
         | 
| 97 | 
             
            5. In the top left, click the refresh icon next to **Model**.
         | 
| 98 | 
            -
            6.  | 
| 99 | 
            -
            7.  | 
| 100 | 
            -
            8.  | 
|  | |
| 101 | 
             
              * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
         | 
| 102 | 
            -
             | 
| 103 |  | 
| 104 | 
             
            ## How to use this GPTQ model from Python code
         | 
| 105 |  | 
|  | |
| 66 |  | 
| 67 | 
             
            | Branch | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Description |
         | 
| 68 | 
             
            | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
         | 
| 69 | 
            +
            | main | 4 | 128 | False | 4.21 GB | False | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
         | 
| 70 | 
            +
            | gptq-4bit-32g-actorder_True | 4 | 32 | True | 4.59 GB | False | AutoGPTQ | 4-bit, with Act Order and group size. 32g gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
         | 
| 71 | 
            +
            | gptq-4bit-64g-actorder_True | 4 | 64 | True | 4.34 GB | False | AutoGPTQ | 4-bit, with Act Order and group size. 64g uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
         | 
| 72 | 
            +
            | gptq-4bit-128g-actorder_True | 4 | 128 | True | 4.21 GB | False | AutoGPTQ | 4-bit, with Act Order and group size. 128g uses even less VRAM, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
         | 
| 73 | 
             
            | gptq-8bit--1g-actorder_True | 8 | None | True | 7.33 GB | False | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
         | 
| 74 | 
             
            | gptq-8bit-128g-actorder_False | 8 | 128 | False | 7.47 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |
         | 
| 75 |  | 
|  | |
| 88 |  | 
| 89 | 
             
            It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
         | 
| 90 |  | 
| 91 | 
            +
            Please remember to install `tiktoken`, as listed above.
         | 
| 92 | 
            +
             | 
| 93 | 
            +
            You must also tick "Trust Remote Code", which means it's not compatible with ExLlama.
         | 
| 94 | 
            +
             | 
| 95 | 
             
            1. Click the **Model tab**.
         | 
| 96 | 
             
            2. Under **Download custom model or LoRA**, enter `TheBloke/Codegen25-7B-mono-GPTQ`.
         | 
| 97 | 
             
              - To download from a specific branch, enter for example `TheBloke/Codegen25-7B-mono-GPTQ:gptq-4bit-32g-actorder_True`
         | 
|  | |
| 99 | 
             
            3. Click **Download**.
         | 
| 100 | 
             
            4. The model will start downloading. Once it's finished it will say "Done"
         | 
| 101 | 
             
            5. In the top left, click the refresh icon next to **Model**.
         | 
| 102 | 
            +
            6. Make sure Loader is set to AutoGPTQ or GPTQ-for-LLaMa, and that Trust Remote Code is ticked.
         | 
| 103 | 
            +
            7. In the **Model** dropdown, choose the model you just downloaded: `Codegen25-7B-mono-GPTQ`
         | 
| 104 | 
            +
            8. The model will automatically load, and is now ready for use!
         | 
| 105 | 
            +
            9. To save your settings (Loader and Trust Remote Code), click **Save settings for this model** followed by **Reload the Model** in the top right.
         | 
| 106 | 
             
              * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
         | 
| 107 | 
            +
            10. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
         | 
| 108 |  | 
| 109 | 
             
            ## How to use this GPTQ model from Python code
         | 
| 110 |  | 
