AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Abstract
AnimeGamer uses Multimodal Large Language Models (MLLMs) and a video diffusion model to generate contextually consistent, dynamic, and high-quality video clips for an infinite game simulation.
Recent advancements in image and video synthesis have opened up new promise in generative games. One particularly intriguing application is transforming characters from anime films into interactive, playable entities. This allows players to immerse themselves in the dynamic anime world as their favorite characters for life simulation through language instructions. Such games are defined as infinite game since they eliminate predetermined boundaries and fixed gameplay rules, where players can interact with the game world through open-ended language and experience ever-evolving storylines and environments. Recently, a pioneering approach for infinite anime life simulation employs large language models (LLMs) to translate multi-turn text dialogues into language instructions for image generation. However, it neglects historical visual context, leading to inconsistent gameplay. Furthermore, it only generates static images, failing to incorporate the dynamics necessary for an engaging gaming experience. In this work, we propose AnimeGamer, which is built upon Multimodal Large Language Models (MLLMs) to generate each game state, including dynamic animation shots that depict character movements and updates to character states, as illustrated in Figure 1. We introduce novel action-aware multimodal representations to represent animation shots, which can be decoded into high-quality video clips using a video diffusion model. By taking historical animation shot representations as context and predicting subsequent representations, AnimeGamer can generate games with contextual consistency and satisfactory dynamics. Extensive evaluations using both automated metrics and human evaluations demonstrate that AnimeGamer outperforms existing methods in various aspects of the gaming experience. Codes and checkpoints are available at https://github.com/TencentARC/AnimeGamer.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MoCha: Towards Movie-Grade Talking Character Synthesis (2025)
- MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice (2025)
- Position: Interactive Generative Video as Next-Generation Game Engine (2025)
- WeGen: A Unified Model for Interactive Multimodal Generation as We Chat (2025)
- WorldScore: A Unified Evaluation Benchmark for World Generation (2025)
- VideoGen-Eval: Agent-based System for Video Generation Evaluation (2025)
- Automated Movie Generation via Multi-Agent CoT Planning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 
					 
						