Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -73,7 +73,9 @@ GPT-4All Benchmark Set | |
| 73 | 
             
            |piqa         |      0|acc     |0.7922|±  |0.0095|
         | 
| 74 | 
             
            |             |       |acc_norm|0.8112|±  |0.0091|
         | 
| 75 | 
             
            |winogrande   |      0|acc     |0.7293|±  |0.0125|
         | 
| 76 | 
            -
             | 
|  | |
|  | |
| 77 | 
             
            AGI-Eval
         | 
| 78 | 
             
            ```
         | 
| 79 | 
             
            |             Task             |Version| Metric |Value |   |Stderr|
         | 
| @@ -94,6 +96,7 @@ AGI-Eval | |
| 94 | 
             
            |                              |       |acc_norm|0.4029|±  |0.0343|
         | 
| 95 | 
             
            |agieval_sat_math              |      0|acc     |0.3273|±  |0.0317|
         | 
| 96 | 
             
            |                              |       |acc_norm|0.2636|±  |0.0298|
         | 
|  | |
| 97 | 
             
            ```
         | 
| 98 | 
             
            BigBench Reasoning Test
         | 
| 99 | 
             
            ```
         | 
| @@ -118,6 +121,7 @@ BigBench Reasoning Test | |
| 118 | 
             
            |bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.2048|±  |0.0114|
         | 
| 119 | 
             
            |bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1297|±  |0.0080|
         | 
| 120 | 
             
            |bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.4500|±  |0.0288|
         | 
|  | |
| 121 | 
             
            ```  
         | 
| 122 |  | 
| 123 | 
             
            This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
         | 
|  | |
| 73 | 
             
            |piqa         |      0|acc     |0.7922|±  |0.0095|
         | 
| 74 | 
             
            |             |       |acc_norm|0.8112|±  |0.0091|
         | 
| 75 | 
             
            |winogrande   |      0|acc     |0.7293|±  |0.0125|
         | 
| 76 | 
            +
            Average: 0.7036
         | 
| 77 | 
            +
            ```  
         | 
| 78 | 
            +
             | 
| 79 | 
             
            AGI-Eval
         | 
| 80 | 
             
            ```
         | 
| 81 | 
             
            |             Task             |Version| Metric |Value |   |Stderr|
         | 
|  | |
| 96 | 
             
            |                              |       |acc_norm|0.4029|±  |0.0343|
         | 
| 97 | 
             
            |agieval_sat_math              |      0|acc     |0.3273|±  |0.0317|
         | 
| 98 | 
             
            |                              |       |acc_norm|0.2636|±  |0.0298|
         | 
| 99 | 
            +
            Average: 0.3556
         | 
| 100 | 
             
            ```
         | 
| 101 | 
             
            BigBench Reasoning Test
         | 
| 102 | 
             
            ```
         | 
|  | |
| 121 | 
             
            |bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.2048|±  |0.0114|
         | 
| 122 | 
             
            |bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1297|±  |0.0080|
         | 
| 123 | 
             
            |bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.4500|±  |0.0288|
         | 
| 124 | 
            +
            Average: 36.75
         | 
| 125 | 
             
            ```  
         | 
| 126 |  | 
| 127 | 
             
            This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
         | 
