π» URSA
Collection
URSA: Uniform Discrete Diffusion with Metric Path for Video Generation
β’
4 items
β’
Updated
β’
4
Using the π€'s Diffusers library to run URSA in a simple and efficient manner.
pip install diffusers transformers accelerate imageio[ffmpeg]
pip install git+ssh://[email protected]/baaivision/URSA.git
Running the pipeline:
import os, torch, numpy
from diffnext.pipelines import URSAPipeline
from diffnext.utils import export_to_video
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
model_id, height, width = "BAAI/URSA-1.7B-FSQ320", 320, 512
model_args = {"torch_dtype": torch.float16, "trust_remote_code": True}
pipe = URSAPipeline.from_pretrained(model_id, **model_args)
pipe = pipe.to(torch.device("cuda"))
text_prompt = "a lone grizzly bear walks through a misty forest at dawn, sunlight catching its fur."
negative_prompt = "worst quality, low quality, inconsistent motion, static, still, blurry, jittery, distorted, ugly"
# Text-to-Image
prompt = text_prompt
num_frames, num_inference_steps = 1, 25
image = pipe(**locals()).frames[0]
image.save("ursa.jpg")
# Image-to-Video
prompt = f"motion=9.0, {text_prompt}"
num_frames, num_inference_steps = 49, 50
video = pipe(**locals()).frames[0]
export_to_video(video, "ursa_1+48f.mp4", fps=12)
# Text-to-Video
image, video = None, None
prompt = f"motion=9.0, {text_prompt}"
num_frames, num_inference_steps = 49, 50
video = pipe(**locals()).frames[0]
export_to_video(video, "ursa_49f.mp4", fps=12)
# Video-to-Video
prompt = f"motion=5.0, {text_prompt}"
num_frames, num_inference_steps = 49, 50
num_cond_frames, cond_noise_scale = 13, 0.1
for i in range(12):
video, start_video = video[-num_cond_frames:], video
video = pipe(**locals()).frames[0]
video = numpy.concatenate([start_video, video[num_cond_frames:]])
export_to_video(video, "ursa_{}f.mp4".format(video.shape[0]), fps=12)
The model is intended for research purposes only. Possible research areas and tasks include
Excluded uses are described below.
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.