Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Terence Wong's picture

Terence Wong

AnitaLeung

AI & ML interests

None yet

Organizations

None yet

Collections 1

ReMix
Reincarnated Mix-policy Proximal Policy Gradient (ReMix), a general approach to enable on-policy RFT methods like PPO and GRPO to leverage off-policy.
  • AnitaLeung/Remix-R1-Distilled-Qwen-1.5B

    2B • Updated Aug 29 • 4
  • AnitaLeung/Remix-R1-Distilled-Qwen-7B

    8B • Updated Aug 29 • 6
ReMix
Reincarnated Mix-policy Proximal Policy Gradient (ReMix), a general approach to enable on-policy RFT methods like PPO and GRPO to leverage off-policy.
  • AnitaLeung/Remix-R1-Distilled-Qwen-1.5B

    2B • Updated Aug 29 • 4
  • AnitaLeung/Remix-R1-Distilled-Qwen-7B

    8B • Updated Aug 29 • 6

models 2

AnitaLeung/Remix-R1-Distilled-Qwen-1.5B

2B • Updated Aug 29 • 4

AnitaLeung/Remix-R1-Distilled-Qwen-7B

8B • Updated Aug 29 • 6

datasets 0

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs