Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
JM-Brun 's Collections
Data Agents
RL
Diffusion models
Prompt Optimization
Tool calling
Tabular
Multimodal
Agents
Attribution
SLMs
LLM-as-a-judge
LLM Training
LLM-KG
Research Tool
LLM Architecture
LLM Data
World model
Reasonning
LLM Math
Interpretability XAI
Hallucinations

RL

updated 25 days ago
Upvote
-

  • A Survey of Reinforcement Learning for Large Reasoning Models

    Paper • 2509.08827 • Published Sep 10 • 183

  • Language Models Can Learn from Verbal Feedback Without Scalar Rewards

    Paper • 2509.22638 • Published 28 days ago • 67
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs