ReasoningMila (Mila Reasoning)

arianhosseini

authored 7 papers 2 months ago

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

Paper • 2306.12509 • Published Jun 21, 2023 • 14

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Paper • 2408.16737 • Published Aug 29, 2024 • 1

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Paper • 2410.18252 • Published Oct 23, 2024 • 7

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published May 7 • 12

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

Paper • 2508.10142 • Published Aug 13 • 3

ayush1801

updated a dataset 5 months ago

ReasoningMila/Training_gen_dataset

Viewer • Updated Jun 8 • 7.5k • 6

ayush1801

published a dataset 5 months ago

ReasoningMila/Training_gen_dataset

Viewer • Updated Jun 8 • 7.5k • 6

ayush1801

updated a dataset 5 months ago

ReasoningMila/ServiceNowAI_R1_Distill_SFT_with_problems_and_responses

Viewer • Updated May 22 • 1.68M • 107

ayush1801

published a dataset 6 months ago

ReasoningMila/ServiceNowAI_R1_Distill_SFT_with_problems_and_responses

Viewer • Updated May 22 • 1.68M • 107

kushasareen

authored a paper 6 months ago

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published May 7 • 12

abhranil14

authored 2 papers 6 months ago

VideoAgent: Self-Improving Video Generation

Paper • 2410.10076 • Published Oct 14, 2024

Leveraging recent advances in Pre-Trained Language Models forEye-Tracking Prediction

Paper • 2110.04475 • Published Oct 9, 2021

abhranil14

updated a dataset 7 months ago

ReasoningMila/math7500_1_wrong_soln_wrt_human_gold

Viewer • Updated Apr 4 • 6.08k • 10

abhranil14

published a dataset 7 months ago

ReasoningMila/math7500_1_wrong_soln_wrt_human_gold

Viewer • Updated Apr 4 • 6.08k • 10

arianhosseini

authored a paper 7 months ago

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Paper • 2504.01005 • Published Apr 1 • 15

abhranil14

updated a dataset 7 months ago

ReasoningMila/wrong_solutions_dataset_of_30k_verified_qs

Viewer • Updated Mar 25 • 22.4k • 3

abhranil14

published a dataset 7 months ago

ReasoningMila/wrong_solutions_dataset_of_30k_verified_qs

Viewer • Updated Mar 25 • 22.4k • 3

abhranil14

updated a dataset 7 months ago

ReasoningMila/syn_qs_and_soln_cleaned_0_and_less20_multiple_soln_per_qs_1937545

Viewer • Updated Mar 23 • 1.94M • 15

AI & ML interests

Team members 5

ReasoningMila's activity