Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

mzhaoshuai
/
zephyr-7b-alpha-conf-refalign

PEFT
Safetensors
mistral
refalign
Model card Files Files and versions
xet
Community
  • RefAlign: RL with Similarity-based Rewards
    • Framework versions

    RefAlign: RL with Similarity-based Rewards

    GitHub repository: https://github.com/mzhaoshuai/RefAlign

    Paper: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.

    RefAlign Training with https://huggingface.co/datasets/shuchangtao/CONQORD_dataset/tree/main/conqord_step3_data.

    Framework versions

    • PEFT 0.11.1
    Downloads last month
    46
    Inference Providers NEW
    This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

    Model tree for mzhaoshuai/zephyr-7b-alpha-conf-refalign

    Base model

    mistralai/Mistral-7B-v0.1
    Finetuned
    HuggingFaceH4/zephyr-7b-alpha
    Quantized
    mzhaoshuai/zephyr-7b-alpha-conf-sft
    Adapter
    (1)
    this model

    Dataset used to train mzhaoshuai/zephyr-7b-alpha-conf-refalign

    shuchangtao/CONQORD_dataset

    Preview • Updated Aug 12, 2024 • 41 • 1

    Collection including mzhaoshuai/zephyr-7b-alpha-conf-refalign

    RefAlign: RL with Similarity-based Rewards

    Collection
    Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data. • 19 items • Updated about 9 hours ago • 1
    Company
    TOS Privacy About Jobs
    Website
    Models Datasets Spaces Pricing Docs