Vector Quantization using Gaussian Variational Autoencoder

This repository contains the official implementation of Gaussian Quant (GQ), a novel method for vector quantization presented in the paper "Vector Quantization using Gaussian Variational Autoencoder".

GQ proposes a simple yet effective technique that converts a Gaussian Variational Autoencoder (VAE) into a VQ-VAE without the need for additional training. It achieves this by generating random Gaussian noise as a codebook and finding the closest noise to the posterior mean. Theoretically, it's proven that a small quantization error is guaranteed when the logarithm of the codebook size exceeds the bits-back coding rate. Empirically, GQ, combined with a heuristic called target divergence constraint (TDC), outperforms previous VQ-VAEs like VQGAN, FSQ, LFQ, and BSQ on both UNet and ViT architectures.

\ud83d\udcda Paper on Hugging Face: Vector Quantization using Gaussian Variational Autoencoder
\ud83c\udf10 Project Page: https://tongdaxu.github.io/pages/gq.html
\ud83d\udcbb GitHub Repository: https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE

Quick Start & Usage

This section provides a quick guide to installing the necessary dependencies, downloading pre-trained models, and inferring with them. For more details and training instructions, please refer to the GitHub repository.

Install dependency

Install dependencies in environment.yaml:

conda env create --file=environment.yaml
conda activate tokenizer

Install this package

From source:
```
pip install -e .
```

[Optional] CUDA kernel for fast run time:

cd gq_cuda_extension
pip install --no-build-isolation -e .

Download pre-trained model

Download model "sd3unet_gq_0.25.ckpt" from Huggingface:

mkdir model_256
mv "sd3unet_gq_0.25.ckpt" ./model_256

This is a VQ-VAE with codebook_size=2**16=65536 and codebook_dim=16.

Infer the model as VQ-VAE

Then use the model as follows:

from PIL import Image
from torchvision import transforms
from omegaconf import OmegaConf
from pit.util import instantiate_from_config
import torch

transform = transforms.Compose([
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5],
                        std=[0.5, 0.5, 0.5])
])

img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
vae = instantiate_from_config(config.model)
vae.load_state_dict(
    torch.load("models_256/sd3unet_gq_0.25.ckpt",
        map_location=torch.device('cpu'))["state_dict"],strict=False
    )
vae = vae.eval().cuda()

vae.eval()
z, log = vae.encode(img, return_reg_log=True)
img_hat = vae.dequant(log["indices"]) # discrete indices
img_hat = vae.decode(z) # quantized latent

Infer the model as Gaussian VAE

Alternatively, the model can be used as a Vanilla Gaussian VAE:

from PIL import Image
from torchvision import transforms
from omegaconf import OmegaConf
from pit.util import instantiate_from_config
import torch

transform = transforms.Compose([
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5],
                        std=[0.5, 0.5, 0.5])
])

img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
vae = instantiate_from_config(config.model)
vae.load_state_dict(
    torch.load("models_256/sd3unet_gq_0.25.ckpt",
        map_location=torch.device('cpu'))["state_dict"],strict=False
    )
vae = vae.eval().cuda()

vae.eval()

z = vae.encode(img, return_reg_log=True)[1]["zhat_noquant"] # Gaussian VAE latents
img_hat = vae.decode(z)

Citation

If you find our work helpful or inspiring, please feel free to cite it:

@misc{xu2025vectorquantizationusinggaussian,
      title={Vector Quantization using Gaussian Variational Autoencoder},
      author={Tongda Xu and Wendi Zheng and Jiajun He and Jose Miguel Hernandez-Lobato and Yan Wang and Ya-Qin Zhang and Jie Tang},
      year={2025},
      eprint={2512.06609},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.06609},
}

Downloads last month: -; Downloads are not tracked for this model. How to track