deepseek-community
/

Janus-Pro-1B

@@ -42,10 +42,105 @@ For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm
 Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
-## 4. License
 This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
-## 5. Citation
 ```
 @article{chen2025janus,
@@ -56,6 +151,6 @@ This code repository is licensed under [the MIT License](https://github.com/deep
 }
 ```
-## 6. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).

 Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
+## 4. Usage Examples
+### Single Image Inference
+Here is an example of visual understanding with a single image.
+```python
+import torch
+from PIL import Image
+import requests
+from transformers import JanusForConditionalGeneration, JanusProcessor
+model_id = "deepseek-community/Janus-Pro-1B"
+# Prepare input for generation
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {'type': 'image', 'url': 'http://images.cocodataset.org/val2017/000000039769.jpg'},
+            {'type': 'text', 'text': "What do you see in this image?"}
+        ]
+    },
+]
+# Set generation mode to 'text' to perform text generation
+processor = JanusProcessor.from_pretrained(model_id)
+model = JanusForConditionalGeneration.from_pretrained(
+    model_id, torch_dtype=torch.bfloat16, device_map="auto"
+)
+inputs = processor.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    generation_mode="text",
+    tokenize=True,
+    return_dict=True,
+    return_tensors="pt"
+).to(model.device, dtype=torch.bfloat16)
+output = model.generate(**inputs, max_new_tokens=40, generation_mode='text', do_sample=True)
+text = processor.decode(output[0], skip_special_tokens=True)
+print(text)
+```
+## Text to Image generation
+Janus can also generate images from prompts by simply setting the generation mode to `image` as shown below.
+```python
+import torch
+from transformers import JanusForConditionalGeneration, JanusProcessor
+model_id = "deepseek-community/Janus-Pro-1B"
+# Load processor and model
+processor = JanusProcessor.from_pretrained(model_id)
+model = JanusForConditionalGeneration.from_pretrained(
+    model_id, torch_dtype=torch.bfloat16, device_map="auto"
+)
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "A dog running under the rain."}
+        ]
+    }
+]
+# Apply chat template
+prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
+inputs = processor(
+    text=prompt,
+    generation_mode="image",
+    return_tensors="pt"
+).to(model.device, dtype=torch.bfloat16)
+# Set number of images to generate
+model.generation_config.num_return_sequences = 2
+outputs = model.generate(
+    **inputs,
+    generation_mode="image",
+    do_sample=True,
+    use_cache=True
+)
+# Decode and save images
+decoded_image = model.decode_image_tokens(outputs)
+images = processor.postprocess(list(decoded_image.float()), return_tensors="PIL.Image.Image")
+for i, image in enumerate(images["pixel_values"]):
+    image.save(f"image{i}.png")
+```
+## 5. License
 This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
+## 6. Citation
 ```
 @article{chen2025janus,
 }
 ```
+## 7. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).