yaswanthgali commited on
Commit
f73b6fe
·
verified ·
1 Parent(s): 4eeda9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -3
README.md CHANGED
@@ -42,10 +42,105 @@ For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm
42
  Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
43
 
44
 
45
- ## 4. License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
48
- ## 5. Citation
49
 
50
  ```
51
  @article{chen2025janus,
@@ -56,6 +151,6 @@ This code repository is licensed under [the MIT License](https://github.com/deep
56
  }
57
  ```
58
 
59
- ## 6. Contact
60
 
61
  If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).
 
42
  Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
43
 
44
 
45
+ ## 4. Usage Examples
46
+
47
+ ### Single Image Inference
48
+
49
+ Here is an example of visual understanding with a single image.
50
+
51
+ ```python
52
+ import torch
53
+ from PIL import Image
54
+ import requests
55
+ from transformers import JanusForConditionalGeneration, JanusProcessor
56
+
57
+ model_id = "deepseek-community/Janus-Pro-1B"
58
+
59
+ # Prepare input for generation
60
+ messages = [
61
+ {
62
+ "role": "user",
63
+ "content": [
64
+ {'type': 'image', 'url': 'http://images.cocodataset.org/val2017/000000039769.jpg'},
65
+ {'type': 'text', 'text': "What do you see in this image?"}
66
+ ]
67
+ },
68
+ ]
69
+
70
+ # Set generation mode to 'text' to perform text generation
71
+ processor = JanusProcessor.from_pretrained(model_id)
72
+ model = JanusForConditionalGeneration.from_pretrained(
73
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
74
+ )
75
+
76
+ inputs = processor.apply_chat_template(
77
+ messages,
78
+ add_generation_prompt=True,
79
+ generation_mode="text",
80
+ tokenize=True,
81
+ return_dict=True,
82
+ return_tensors="pt"
83
+ ).to(model.device, dtype=torch.bfloat16)
84
+
85
+ output = model.generate(**inputs, max_new_tokens=40, generation_mode='text', do_sample=True)
86
+ text = processor.decode(output[0], skip_special_tokens=True)
87
+ print(text)
88
+ ```
89
+
90
+ ## Text to Image generation
91
+ Janus can also generate images from prompts by simply setting the generation mode to `image` as shown below.
92
+
93
+ ```python
94
+ import torch
95
+ from transformers import JanusForConditionalGeneration, JanusProcessor
96
+
97
+ model_id = "deepseek-community/Janus-Pro-1B"
98
+
99
+ # Load processor and model
100
+ processor = JanusProcessor.from_pretrained(model_id)
101
+ model = JanusForConditionalGeneration.from_pretrained(
102
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
103
+ )
104
+
105
+ messages = [
106
+ {
107
+ "role": "user",
108
+ "content": [
109
+ {"type": "text", "text": "A dog running under the rain."}
110
+ ]
111
+ }
112
+ ]
113
+
114
+ # Apply chat template
115
+ prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
116
+ inputs = processor(
117
+ text=prompt,
118
+ generation_mode="image",
119
+ return_tensors="pt"
120
+ ).to(model.device, dtype=torch.bfloat16)
121
+
122
+ # Set number of images to generate
123
+ model.generation_config.num_return_sequences = 2
124
+
125
+ outputs = model.generate(
126
+ **inputs,
127
+ generation_mode="image",
128
+ do_sample=True,
129
+ use_cache=True
130
+ )
131
+
132
+ # Decode and save images
133
+ decoded_image = model.decode_image_tokens(outputs)
134
+ images = processor.postprocess(list(decoded_image.float()), return_tensors="PIL.Image.Image")
135
+
136
+ for i, image in enumerate(images["pixel_values"]):
137
+ image.save(f"image{i}.png")
138
+ ```
139
+
140
+ ## 5. License
141
 
142
  This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
143
+ ## 6. Citation
144
 
145
  ```
146
  @article{chen2025janus,
 
151
  }
152
  ```
153
 
154
+ ## 7. Contact
155
 
156
  If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).