metadata
			language:
  - en
license: apache-2.0
library_name: transformers
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
pipeline_tag: question-answering
model-index:
  - name: DistilHermes-2.5-Mistral-7B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 65.87
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=eren23/DistilHermes-2.5-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 84.78
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=eren23/DistilHermes-2.5-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 63.65
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=eren23/DistilHermes-2.5-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 54.24
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=eren23/DistilHermes-2.5-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 78.22
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=eren23/DistilHermes-2.5-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 59.82
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=eren23/DistilHermes-2.5-Mistral-7B
          name: Open LLM Leaderboard
DPO Finetuned teknium/OpenHermes-2.5-Mistral-7B using argilla/distilabel-intel-orca-dpo-pairs.
Intel orca dpo pairs is a distilled version: https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
of https://huggingface.co/datasets/Intel/orca_dpo_pairs
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
| Metric | Value | 
|---|---|
| Avg. | 67.76 | 
| AI2 Reasoning Challenge (25-Shot) | 65.87 | 
| HellaSwag (10-Shot) | 84.78 | 
| MMLU (5-Shot) | 63.65 | 
| TruthfulQA (0-shot) | 54.24 | 
| Winogrande (5-shot) | 78.22 | 
| GSM8k (5-shot) | 59.82 | 
