arxiv:2509.10441

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Published on Sep 12

· Submitted by

taesiri on Sep 15

#1 Paper of the day

Upvote

Authors:

Wanghan Xu ,

Xiaoyu Yue ,

Abstract

InfGen, a one-step generator replacing the VAE decoder, enables arbitrary high-resolution image generation from a fixed-size latent, significantly reducing computational complexity and generation time.

AI-generated summary

Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the InfGen, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.

View arXiv page View PDF Add to collection

Community

taesiri

Paper submitter Sep 15

librarian-bot

Sep 16

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

marksverdhei

about 1 month ago

Very interesting paper. I do wonder if this method can be used for native low-resolution image generation too, such as pixel art. The lower end of the 'reliable exploration' is 256, but I'm wondering if sub 256 was unexplored due to an assumption that low res images aren't desirable.
True arbitrary resolution should also generalize on the extreme low end, right?