Papers
arxiv:2412.01928

MALT: Improving Reasoning with Multi-Agent LLM Training

Published on Dec 2, 2024
· Submitted by Ronald Clark on Dec 4, 2024
#3 Paper of the day
Authors:
,
,
,
,
,

Abstract

Multi-agent LLM training improves performance on reasoning tasks by assigning specialized roles and utilizing joint outcome-based rewards to enhance collaboration among models.

AI-generated summary

Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.

Community

Paper author Paper submitter

We propose a multi-agent LLM training approach (MALT) that specialises a base LLM into different roles -a generator, verifier, and refinement model - to collaboratively solve reasoning tasks through iterative problem-solving. We use synthetic data generation and outcome-based rewards to improve each model’s capabilities, demonstrating significant performance gains on mathematical and common sense reasoning benchmarks.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

@librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.01928 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.01928 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.01928 in a Space README.md to link it from this page.

Collections including this paper 11