Commit 
							
							·
						
						fca423b
	
1
								Parent(s):
							
							b57b9f5
								
Adding Evaluation Results (#7)
Browse files- Adding Evaluation Results (e7128026c902277c6664dd527ac61d251b75dc3c)
Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>
    	
        README.md
    CHANGED
    
    | @@ -74,4 +74,17 @@ The model may not perform effectively outside the scope of the medical domain. | |
| 74 | 
             
            The training data primarily targets the knowledge level of medical students, 
         | 
| 75 | 
             
            which may result in limitations when addressing the needs of board-certified physicians.
         | 
| 76 | 
             
            The model has not been tested in real-world applications, so its efficacy and accuracy are currently unknown. 
         | 
| 77 | 
            -
            It should never be used as a substitute for a doctor's opinion and must be treated as a research tool only.
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 74 | 
             
            The training data primarily targets the knowledge level of medical students, 
         | 
| 75 | 
             
            which may result in limitations when addressing the needs of board-certified physicians.
         | 
| 76 | 
             
            The model has not been tested in real-world applications, so its efficacy and accuracy are currently unknown. 
         | 
| 77 | 
            +
            It should never be used as a substitute for a doctor's opinion and must be treated as a research tool only.
         | 
| 78 | 
            +
            # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
         | 
| 79 | 
            +
            Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_medalpaca__medalpaca-7b)
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            | Metric                | Value                     |
         | 
| 82 | 
            +
            |-----------------------|---------------------------|
         | 
| 83 | 
            +
            | Avg.                  | 44.98   |
         | 
| 84 | 
            +
            | ARC (25-shot)         | 54.1          |
         | 
| 85 | 
            +
            | HellaSwag (10-shot)   | 80.42    |
         | 
| 86 | 
            +
            | MMLU (5-shot)         | 41.47         |
         | 
| 87 | 
            +
            | TruthfulQA (0-shot)   | 40.46   |
         | 
| 88 | 
            +
            | Winogrande (5-shot)   | 71.19   |
         | 
| 89 | 
            +
            | GSM8K (5-shot)        | 3.03        |
         | 
| 90 | 
            +
            | DROP (3-shot)         | 24.21         |
         | 

