rp-yu commited on
Commit
38b26bc
·
verified ·
1 Parent(s): 37aed80
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
added_tokens.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|beginoftext|>": 151665,
5
+ "<|box_end|>": 151649,
6
+ "<|box_start|>": 151648,
7
+ "<|endoftext|>": 151643,
8
+ "<|file_sep|>": 151664,
9
+ "<|fim_middle|>": 151660,
10
+ "<|fim_pad|>": 151662,
11
+ "<|fim_prefix|>": 151659,
12
+ "<|fim_suffix|>": 151661,
13
+ "<|im_end|>": 151645,
14
+ "<|im_start|>": 151644,
15
+ "<|image_pad|>": 151655,
16
+ "<|mask|>": 151666,
17
+ "<|object_ref_end|>": 151647,
18
+ "<|object_ref_start|>": 151646,
19
+ "<|quad_end|>": 151651,
20
+ "<|quad_start|>": 151650,
21
+ "<|repo_name|>": 151663,
22
+ "<|video_pad|>": 151656,
23
+ "<|vision_end|>": 151653,
24
+ "<|vision_pad|>": 151654,
25
+ "<|vision_start|>": 151652
26
+ }
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
3
+ }
image_processing_dimple.py ADDED
@@ -0,0 +1,458 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The Dimple team and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ """Image processor class for Dimple."""
21
+
22
+ import math
23
+ from typing import Dict, List, Optional, Union
24
+
25
+ import numpy as np
26
+
27
+ from transformers.image_processing_utils import BaseImageProcessor, BatchFeature
28
+ from transformers.image_transforms import (
29
+ convert_to_rgb,
30
+ resize,
31
+ to_channel_dimension_format,
32
+ )
33
+ from transformers.image_utils import (
34
+ OPENAI_CLIP_MEAN,
35
+ OPENAI_CLIP_STD,
36
+ ChannelDimension,
37
+ ImageInput,
38
+ PILImageResampling,
39
+ VideoInput,
40
+ get_image_size,
41
+ infer_channel_dimension_format,
42
+ is_scaled_image,
43
+ is_valid_image,
44
+ make_list_of_images,
45
+ to_numpy_array,
46
+ valid_images,
47
+ validate_preprocess_arguments,
48
+ )
49
+ from transformers.utils import TensorType, is_vision_available, logging
50
+
51
+
52
+ logger = logging.get_logger("Dimple."+__name__)
53
+
54
+
55
+ if is_vision_available():
56
+ from PIL import Image
57
+
58
+
59
+ def make_batched_images(images) -> List[List[ImageInput]]:
60
+ """
61
+ Accepts images in list or nested list format, and makes a list of images for preprocessing.
62
+
63
+ Args:
64
+ images (`Union[List[List[ImageInput]], List[ImageInput], ImageInput]`):
65
+ The input image.
66
+
67
+ Returns:
68
+ list: A list of images.
69
+ """
70
+ if isinstance(images, (list, tuple)) and isinstance(images[0], (list, tuple)) and is_valid_image(images[0][0]):
71
+ return [img for img_list in images for img in img_list]
72
+
73
+ elif isinstance(images, (list, tuple)) and is_valid_image(images[0]):
74
+ return images
75
+
76
+ elif is_valid_image(images):
77
+ return [images]
78
+
79
+ raise ValueError(f"Could not make batched images from {images}")
80
+
81
+
82
+ # Copied from transformers.models.llava_next_video.image_processing_llava_next_video.make_batched_videos
83
+ def make_batched_videos(videos) -> List[VideoInput]:
84
+ if isinstance(videos, (list, tuple)) and isinstance(videos[0], (list, tuple)) and is_valid_image(videos[0][0]):
85
+ return videos
86
+
87
+ elif isinstance(videos, (list, tuple)) and is_valid_image(videos[0]):
88
+ if isinstance(videos[0], Image.Image):
89
+ return [videos]
90
+ elif len(videos[0].shape) == 4:
91
+ return [list(video) for video in videos]
92
+
93
+ elif is_valid_image(videos) and len(videos.shape) == 4:
94
+ return [list(videos)]
95
+
96
+ raise ValueError(f"Could not make batched video from {videos}")
97
+
98
+
99
+ def smart_resize(
100
+ height: int, width: int, factor: int = 28, min_pixels: int = 56 * 56, max_pixels: int = 14 * 14 * 4 * 1280
101
+ ):
102
+ """Rescales the image so that the following conditions are met:
103
+
104
+ 1. Both dimensions (height and width) are divisible by 'factor'.
105
+
106
+ 2. The total number of pixels is within the range ['min_pixels', 'max_pixels'].
107
+
108
+ 3. The aspect ratio of the image is maintained as closely as possible.
109
+
110
+ """
111
+ if height < factor or width < factor:
112
+ raise ValueError(f"height:{height} or width:{width} must be larger than factor:{factor}")
113
+ elif max(height, width) / min(height, width) > 200:
114
+ raise ValueError(
115
+ f"absolute aspect ratio must be smaller than 200, got {max(height, width) / min(height, width)}"
116
+ )
117
+ h_bar = round(height / factor) * factor
118
+ w_bar = round(width / factor) * factor
119
+ if h_bar * w_bar > max_pixels:
120
+ beta = math.sqrt((height * width) / max_pixels)
121
+ h_bar = math.floor(height / beta / factor) * factor
122
+ w_bar = math.floor(width / beta / factor) * factor
123
+ elif h_bar * w_bar < min_pixels:
124
+ beta = math.sqrt(min_pixels / (height * width))
125
+ h_bar = math.ceil(height * beta / factor) * factor
126
+ w_bar = math.ceil(width * beta / factor) * factor
127
+ return h_bar, w_bar
128
+
129
+
130
+ class DimpleImageProcessor(BaseImageProcessor):
131
+ r"""
132
+ Constructs a Dimple image processor that dynamically resizes images based on the original images.
133
+
134
+ Args:
135
+ do_resize (`bool`, *optional*, defaults to `True`):
136
+ Whether to resize the image's (height, width) dimensions.
137
+ resample (`PILImageResampling`, *optional*, defaults to `Resampling.BICUBIC`):
138
+ Resampling filter to use when resizing the image.
139
+ do_rescale (`bool`, *optional*, defaults to `True`):
140
+ Whether to rescale the image by the specified scale `rescale_factor`.
141
+ rescale_factor (`int` or `float`, *optional*, defaults to `1/255`):
142
+ Scale factor to use if rescaling the image.
143
+ do_normalize (`bool`, *optional*, defaults to `True`):
144
+ Whether to normalize the image.
145
+ image_mean (`float` or `List[float]`, *optional*, defaults to `[0.48145466, 0.4578275, 0.40821073]`):
146
+ Mean to use if normalizing the image. This is a float or list of floats for each channel in the image.
147
+ image_std (`float` or `List[float]`, *optional*, defaults to `[0.26862954, 0.26130258, 0.27577711]`):
148
+ Standard deviation to use if normalizing the image. This is a float or list of floats for each channel in the image.
149
+ do_convert_rgb (`bool`, *optional*, defaults to `True`):
150
+ Whether to convert the image to RGB.
151
+ min_pixels (`int`, *optional*, defaults to `56 * 56`):
152
+ The min pixels of the image to resize the image.
153
+ max_pixels (`int`, *optional*, defaults to `28 * 28 * 1280`):
154
+ The max pixels of the image to resize the image.
155
+ patch_size (`int`, *optional*, defaults to 14):
156
+ The spacial patch size of the vision encoder.
157
+ temporal_patch_size (`int`, *optional*, defaults to 2):
158
+ The temporal patch size of the vision encoder.
159
+ merge_size (`int`, *optional*, defaults to 2):
160
+ The merge size of the vision encoder to llm encoder.
161
+ """
162
+
163
+ model_input_names = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw"]
164
+
165
+ def __init__(
166
+ self,
167
+ do_resize: bool = True,
168
+ resample: PILImageResampling = PILImageResampling.BICUBIC,
169
+ do_rescale: bool = True,
170
+ rescale_factor: Union[int, float] = 1 / 255,
171
+ do_normalize: bool = True,
172
+ image_mean: Optional[Union[float, List[float]]] = None,
173
+ image_std: Optional[Union[float, List[float]]] = None,
174
+ do_convert_rgb: bool = True,
175
+ min_pixels: int = 56 * 56,
176
+ max_pixels: int = 28 * 28 * 1280,
177
+ patch_size: int = 14,
178
+ temporal_patch_size: int = 2,
179
+ merge_size: int = 2,
180
+ **kwargs,
181
+ ) -> None:
182
+ super().__init__(**kwargs)
183
+ self.do_resize = do_resize
184
+ self.resample = resample
185
+ self.do_rescale = do_rescale
186
+ self.rescale_factor = rescale_factor
187
+ self.do_normalize = do_normalize
188
+ self.image_mean = image_mean if image_mean is not None else OPENAI_CLIP_MEAN
189
+ self.image_std = image_std if image_std is not None else OPENAI_CLIP_STD
190
+ self.min_pixels = min_pixels
191
+ self.max_pixels = max_pixels
192
+ self.patch_size = patch_size
193
+ self.temporal_patch_size = temporal_patch_size
194
+ self.merge_size = merge_size
195
+ self.size = {"min_pixels": min_pixels, "max_pixels": max_pixels}
196
+ self.do_convert_rgb = do_convert_rgb
197
+
198
+ def _preprocess(
199
+ self,
200
+ images: Union[ImageInput, VideoInput],
201
+ do_resize: bool = None,
202
+ resample: PILImageResampling = None,
203
+ do_rescale: bool = None,
204
+ rescale_factor: float = None,
205
+ do_normalize: bool = None,
206
+ image_mean: Optional[Union[float, List[float]]] = None,
207
+ image_std: Optional[Union[float, List[float]]] = None,
208
+ do_convert_rgb: bool = None,
209
+ data_format: Optional[ChannelDimension] = ChannelDimension.FIRST,
210
+ input_data_format: Optional[Union[str, ChannelDimension]] = None,
211
+ ):
212
+ """
213
+ Preprocess an image or batch of images. Copy of the `preprocess` method from `CLIPImageProcessor`.
214
+
215
+ Args:
216
+ images (`ImageInput`):
217
+ Image or batch of images to preprocess. Expects pixel values ranging from 0 to 255. If pixel values range from 0 to 1, set `do_rescale=False`.
218
+ vision_info (`List[Dict]`, *optional*):
219
+ Optional list of dictionaries containing additional information about vision inputs.
220
+ do_resize (`bool`, *optional*, defaults to `self.do_resize`):
221
+ Whether to resize the image.
222
+ resample (`PILImageResampling`, *optional*, defaults to `self.resample`):
223
+ Resampling filter to use if resizing the image. This can be one of the `PILImageResampling` enums.
224
+ do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
225
+ Whether to rescale the image.
226
+ rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
227
+ Scale factor to use if rescaling the image.
228
+ do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
229
+ Whether to normalize the image.
230
+ image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
231
+ Mean to use if normalizing the image. Can be a float or a list of floats corresponding to the number of channels in the image.
232
+ image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
233
+ Standard deviation to use if normalizing the image. Can be a float or a list of floats corresponding to the number of channels in the image.
234
+ do_convert_rgb (`bool`, *optional*, defaults to `self.do_convert_rgb`):
235
+ Whether to convert the image to RGB.
236
+ data_format (`ChannelDimension`, *optional*, defaults to `ChannelDimension.FIRST`):
237
+ The channel dimension format for the output image. Can be one of:
238
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
239
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
240
+ - Unset: Use the channel dimension format of the input image.
241
+ input_data_format (`ChannelDimension` or `str`, *optional*):
242
+ The channel dimension format for the input image. Can be one of:
243
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
244
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
245
+ - `"none"` or `ChannelDimension.NONE`: image in (height, width) format. - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
246
+ """
247
+ images = make_list_of_images(images)
248
+
249
+ if do_convert_rgb:
250
+ images = [convert_to_rgb(image) for image in images]
251
+
252
+ # All transformations expect numpy arrays.
253
+ images = [to_numpy_array(image) for image in images]
254
+
255
+ if is_scaled_image(images[0]) and do_rescale:
256
+ logger.warning_once(
257
+ "It looks like you are trying to rescale already rescaled images. If the input"
258
+ " images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
259
+ )
260
+ if input_data_format is None:
261
+ # We assume that all images have the same channel dimension format.
262
+ input_data_format = infer_channel_dimension_format(images[0])
263
+
264
+ height, width = get_image_size(images[0], channel_dim=input_data_format)
265
+ resized_height, resized_width = height, width
266
+ processed_images = []
267
+ for image in images:
268
+ if do_resize:
269
+ resized_height, resized_width = smart_resize(
270
+ height,
271
+ width,
272
+ factor=self.patch_size * self.merge_size,
273
+ min_pixels=self.min_pixels,
274
+ max_pixels=self.max_pixels,
275
+ )
276
+ image = resize(
277
+ image, size=(resized_height, resized_width), resample=resample, input_data_format=input_data_format
278
+ )
279
+
280
+ if do_rescale:
281
+ image = self.rescale(image, scale=rescale_factor, input_data_format=input_data_format)
282
+
283
+ if do_normalize:
284
+ image = self.normalize(
285
+ image=image, mean=image_mean, std=image_std, input_data_format=input_data_format
286
+ )
287
+
288
+ image = to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format)
289
+ processed_images.append(image)
290
+
291
+ patches = np.array(processed_images)
292
+ if data_format == ChannelDimension.LAST:
293
+ patches = patches.transpose(0, 3, 1, 2)
294
+ if patches.shape[0] == 1:
295
+ patches = np.tile(patches, (self.temporal_patch_size, 1, 1, 1))
296
+ channel = patches.shape[1]
297
+ grid_t = patches.shape[0] // self.temporal_patch_size
298
+ grid_h, grid_w = resized_height // self.patch_size, resized_width // self.patch_size
299
+ patches = patches.reshape(
300
+ grid_t,
301
+ self.temporal_patch_size,
302
+ channel,
303
+ grid_h // self.merge_size,
304
+ self.merge_size,
305
+ self.patch_size,
306
+ grid_w // self.merge_size,
307
+ self.merge_size,
308
+ self.patch_size,
309
+ )
310
+ patches = patches.transpose(0, 3, 6, 4, 7, 2, 1, 5, 8)
311
+ flatten_patches = patches.reshape(
312
+ grid_t * grid_h * grid_w, channel * self.temporal_patch_size * self.patch_size * self.patch_size
313
+ )
314
+
315
+ return flatten_patches, (grid_t, grid_h, grid_w)
316
+
317
+ def preprocess(
318
+ self,
319
+ images: ImageInput,
320
+ videos: VideoInput = None,
321
+ do_resize: bool = None,
322
+ size: Dict[str, int] = None,
323
+ resample: PILImageResampling = None,
324
+ do_rescale: bool = None,
325
+ rescale_factor: float = None,
326
+ do_normalize: bool = None,
327
+ image_mean: Optional[Union[float, List[float]]] = None,
328
+ image_std: Optional[Union[float, List[float]]] = None,
329
+ do_convert_rgb: bool = None,
330
+ return_tensors: Optional[Union[str, TensorType]] = None,
331
+ data_format: Optional[ChannelDimension] = ChannelDimension.FIRST,
332
+ input_data_format: Optional[Union[str, ChannelDimension]] = None,
333
+ ):
334
+ """
335
+ Args:
336
+ images (`ImageInput`):
337
+ Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If
338
+ passing in images with pixel values between 0 and 1, set `do_rescale=False`.
339
+ videos (`VideoInput`):
340
+ Video to preprocess. Expects a single or batch of videos with pixel values ranging from 0 to 255. If
341
+ passing in videos with pixel values between 0 and 1, set `do_rescale=False`.
342
+ do_resize (`bool`, *optional*, defaults to `self.do_resize`):
343
+ Whether to resize the image.
344
+ size (`Dict[str, int]`, *optional*, defaults to `self.size`):
345
+ Size of the image after resizing. Shortest edge of the image is resized to size["shortest_edge"], with
346
+ the longest edge resized to keep the input aspect ratio.
347
+ resample (`int`, *optional*, defaults to `self.resample`):
348
+ Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only
349
+ has an effect if `do_resize` is set to `True`.
350
+ do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
351
+ Whether to rescale the image.
352
+ rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
353
+ Rescale factor to rescale the image by if `do_rescale` is set to `True`.
354
+ do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
355
+ Whether to normalize the image.
356
+ image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
357
+ Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`.
358
+ image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
359
+ Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to
360
+ `True`.
361
+ do_convert_rgb (`bool`, *optional*, defaults to `self.do_convert_rgb`):
362
+ Whether to convert the image to RGB.
363
+ return_tensors (`str` or `TensorType`, *optional*):
364
+ The type of tensors to return. Can be one of:
365
+ - Unset: Return a list of `np.ndarray`.
366
+ - `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`.
367
+ - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.
368
+ - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`.
369
+ - `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`.
370
+ data_format (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`):
371
+ The channel dimension format for the output image. Can be one of:
372
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
373
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
374
+ - Unset: Use the channel dimension format of the input image.
375
+ input_data_format (`ChannelDimension` or `str`, *optional*):
376
+ The channel dimension format for the input image. If unset, the channel dimension format is inferred
377
+ from the input image. Can be one of:
378
+ - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
379
+ - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
380
+ - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
381
+
382
+ """
383
+ do_resize = do_resize if do_resize is not None else self.do_resize
384
+ size = size if size is not None else self.size
385
+ resample = resample if resample is not None else self.resample
386
+ do_rescale = do_rescale if do_rescale is not None else self.do_rescale
387
+ rescale_factor = rescale_factor if rescale_factor is not None else self.rescale_factor
388
+ do_normalize = do_normalize if do_normalize is not None else self.do_normalize
389
+ image_mean = image_mean if image_mean is not None else self.image_mean
390
+ image_std = image_std if image_std is not None else self.image_std
391
+ do_convert_rgb = do_convert_rgb if do_convert_rgb is not None else self.do_convert_rgb
392
+
393
+ if images is not None:
394
+ images = make_batched_images(images)
395
+ if videos is not None:
396
+ videos = make_batched_videos(videos)
397
+
398
+ if images is not None and not valid_images(images):
399
+ raise ValueError(
400
+ "Invalid image type. Must be of type PIL.Image.Image, numpy.ndarray, "
401
+ "torch.Tensor, tf.Tensor or jax.ndarray."
402
+ )
403
+
404
+ validate_preprocess_arguments(
405
+ rescale_factor=rescale_factor,
406
+ do_normalize=do_normalize,
407
+ image_mean=image_mean,
408
+ image_std=image_std,
409
+ do_resize=do_resize,
410
+ size=size,
411
+ resample=resample,
412
+ )
413
+
414
+ if images is not None:
415
+ pixel_values, vision_grid_thws = [], []
416
+ for image in images:
417
+ patches, image_grid_thw = self._preprocess(
418
+ image,
419
+ do_resize=do_resize,
420
+ resample=resample,
421
+ do_rescale=do_rescale,
422
+ rescale_factor=rescale_factor,
423
+ do_normalize=do_normalize,
424
+ image_mean=image_mean,
425
+ image_std=image_std,
426
+ data_format=data_format,
427
+ do_convert_rgb=do_convert_rgb,
428
+ input_data_format=input_data_format,
429
+ )
430
+ pixel_values.extend(patches)
431
+ vision_grid_thws.append(image_grid_thw)
432
+ pixel_values = np.array(pixel_values)
433
+ vision_grid_thws = np.array(vision_grid_thws)
434
+ data = {"pixel_values": pixel_values, "image_grid_thw": vision_grid_thws}
435
+
436
+ if videos is not None:
437
+ pixel_values, vision_grid_thws = [], []
438
+ for images in videos:
439
+ patches, video_grid_thw = self._preprocess(
440
+ images,
441
+ do_resize=do_resize,
442
+ resample=resample,
443
+ do_rescale=do_rescale,
444
+ rescale_factor=rescale_factor,
445
+ do_normalize=do_normalize,
446
+ image_mean=image_mean,
447
+ image_std=image_std,
448
+ data_format=data_format,
449
+ do_convert_rgb=do_convert_rgb,
450
+ input_data_format=input_data_format,
451
+ )
452
+ pixel_values.extend(patches)
453
+ vision_grid_thws.append(video_grid_thw)
454
+ pixel_values = np.array(pixel_values)
455
+ vision_grid_thws = np.array(vision_grid_thws)
456
+ data = {"pixel_values_videos": pixel_values, "video_grid_thw": vision_grid_thws}
457
+
458
+ return BatchFeature(data=data, tensor_type=return_tensors)
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoImageProcessor": "image_processing_dimple.DimpleImageProcessor",
4
+ "AutoProcessor": "processing_dimple.DimpleProcessor"
5
+ },
6
+ "do_convert_rgb": true,
7
+ "do_normalize": true,
8
+ "do_rescale": true,
9
+ "do_resize": true,
10
+ "image_mean": [
11
+ 0.48145466,
12
+ 0.4578275,
13
+ 0.40821073
14
+ ],
15
+ "image_processor_type": "DimpleImageProcessor",
16
+ "image_std": [
17
+ 0.26862954,
18
+ 0.26130258,
19
+ 0.27577711
20
+ ],
21
+ "max_pixels": 112896.0,
22
+ "merge_size": 2,
23
+ "min_pixels": 3136,
24
+ "patch_size": 14,
25
+ "processor_class": "DimpleProcessor",
26
+ "resample": 3,
27
+ "rescale_factor": 0.00392156862745098,
28
+ "size": {
29
+ "max_pixels": 112896.0,
30
+ "min_pixels": 3136
31
+ },
32
+ "temporal_patch_size": 2
33
+ }
processing_dimple.py ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The Dimple team and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ """
21
+ Processor class for Dimple.
22
+ """
23
+
24
+ from typing import List, Union
25
+
26
+ from transformers.feature_extraction_utils import BatchFeature
27
+ from transformers.image_utils import ImageInput, VideoInput
28
+ from transformers.processing_utils import ProcessingKwargs, ProcessorMixin, Unpack
29
+ from transformers.tokenization_utils_base import PreTokenizedInput, TextInput
30
+ from transformers.utils import logging
31
+
32
+ from .image_processing_dimple import DimpleImageProcessor
33
+ from .tokenization_dimple import DimpleTokenizer
34
+
35
+
36
+ logger = logging.get_logger("Dimple."+__name__)
37
+
38
+
39
+ class DimpleProcessorKwargs(ProcessingKwargs, total=False):
40
+ _defaults = {
41
+ "text_kwargs": {
42
+ "padding": False,
43
+ },
44
+ }
45
+
46
+
47
+ class DimpleProcessor(ProcessorMixin):
48
+ r"""
49
+ Constructs a Dimple processor which wraps a Dimple image processor and a Dimple tokenizer into a single processor.
50
+ [`DimpleProcessor`] offers all the functionalities of [`DimpleImageProcessor`] and [`DimpleTokenizer`]. See the
51
+ [`~DimpleProcessor.__call__`] and [`~DimpleProcessor.decode`] for more information.
52
+ Args:
53
+ image_processor ([`DimpleImageProcessor`], *optional*):
54
+ The image processor is a required input.
55
+ tokenizer ([`DimpleTokenizer`], *optional*):
56
+ The tokenizer is a required input.
57
+ chat_template (`str`, *optional*): A Jinja template which will be used to convert lists of messages
58
+ in a chat into a tokenizable string.
59
+ """
60
+
61
+ attributes = ["image_processor", "tokenizer"]
62
+ valid_kwargs = ["chat_template"]
63
+ image_processor_class = "AutoImageProcessor"
64
+ tokenizer_class = "AutoTokenizer"
65
+
66
+ def __init__(self, image_processor=None, tokenizer=None, chat_template=None, **kwargs):
67
+ super().__init__(image_processor, tokenizer, chat_template=chat_template)
68
+
69
+ def __call__(
70
+ self,
71
+ images: ImageInput = None,
72
+ text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
73
+ videos: VideoInput = None,
74
+ **kwargs: Unpack[DimpleProcessorKwargs],
75
+ ) -> BatchFeature:
76
+ """
77
+ Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text`
78
+ and `kwargs` arguments to DimpleTokenizer's [`~DimpleTokenizer.__call__`] if `text` is not `None` to encode
79
+ the text. To prepare the vision inputs, this method forwards the `vision_infos` and `kwrags` arguments to
80
+ DimpleImageProcessor's [`~DimpleImageProcessor.__call__`] if `vision_infos` is not `None`.
81
+
82
+ Args:
83
+ images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`):
84
+ The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch
85
+ tensor. Both channels-first and channels-last formats are supported.
86
+ text (`str`, `List[str]`, `List[List[str]]`):
87
+ The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings
88
+ (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set
89
+ `is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
90
+ videos (`np.ndarray`, `torch.Tensor`, `List[np.ndarray]`, `List[torch.Tensor]`):
91
+ The image or batch of videos to be prepared. Each video can be a 4D NumPy array or PyTorch
92
+ tensor, or a nested list of 3D frames. Both channels-first and channels-last formats are supported.
93
+ return_tensors (`str` or [`~utils.TensorType`], *optional*):
94
+ If set, will return tensors of a particular framework. Acceptable values are:
95
+ - `'tf'`: Return TensorFlow `tf.constant` objects.
96
+ - `'pt'`: Return PyTorch `torch.Tensor` objects.
97
+ - `'np'`: Return NumPy `np.ndarray` objects.
98
+ - `'jax'`: Return JAX `jnp.ndarray` objects.
99
+
100
+ Returns:
101
+ [`BatchFeature`]: A [`BatchFeature`] with the following fields:
102
+
103
+ - **input_ids** -- List of token ids to be fed to a model. Returned when `text` is not `None`.
104
+ - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when
105
+ `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names` and if `text` is not
106
+ `None`).
107
+ - **pixel_values** -- Pixel values to be fed to a model. Returned when `images` is not `None`.
108
+ - **pixel_values_videos** -- Pixel values of videos to be fed to a model. Returned when `videos` is not `None`.
109
+ - **image_grid_thw** -- List of image 3D grid in LLM. Returned when `images` is not `None`.
110
+ - **video_grid_thw** -- List of video 3D grid in LLM. Returned when `videos` is not `None`.
111
+ """
112
+ output_kwargs = self._merge_kwargs(
113
+ DimpleProcessorKwargs,
114
+ tokenizer_init_kwargs=self.tokenizer.init_kwargs,
115
+ **kwargs,
116
+ )
117
+ if images is not None:
118
+ image_inputs = self.image_processor(images=images, videos=None, **output_kwargs["images_kwargs"])
119
+ image_grid_thw = image_inputs["image_grid_thw"]
120
+ else:
121
+ image_inputs = {}
122
+ image_grid_thw = None
123
+
124
+ if videos is not None:
125
+ videos_inputs = self.image_processor(images=None, videos=videos, **output_kwargs["videos_kwargs"])
126
+ video_grid_thw = videos_inputs["video_grid_thw"]
127
+ else:
128
+ videos_inputs = {}
129
+ video_grid_thw = None
130
+
131
+ if not isinstance(text, list):
132
+ text = [text]
133
+
134
+ if image_grid_thw is not None:
135
+ merge_length = self.image_processor.merge_size**2
136
+ index = 0
137
+ for i in range(len(text)):
138
+ while "<|image_pad|>" in text[i]:
139
+ text[i] = text[i].replace(
140
+ "<|image_pad|>", "<|placeholder|>" * (image_grid_thw[index].prod() // merge_length), 1
141
+ )
142
+ index += 1
143
+ text[i] = text[i].replace("<|placeholder|>", "<|image_pad|>")
144
+
145
+ if video_grid_thw is not None:
146
+ merge_length = self.image_processor.merge_size**2
147
+ index = 0
148
+ for i in range(len(text)):
149
+ while "<|video_pad|>" in text[i]:
150
+ text[i] = text[i].replace(
151
+ "<|video_pad|>", "<|placeholder|>" * (video_grid_thw[index].prod() // merge_length), 1
152
+ )
153
+ index += 1
154
+ text[i] = text[i].replace("<|placeholder|>", "<|video_pad|>")
155
+
156
+ text_inputs = self.tokenizer(text, **output_kwargs["text_kwargs"])
157
+
158
+ return BatchFeature(data={**text_inputs, **image_inputs, **videos_inputs})
159
+
160
+ def batch_decode(self, *args, **kwargs):
161
+ """
162
+ This method forwards all its arguments to DimpleTokenizer's [`~PreTrainedTokenizer.batch_decode`]. Please
163
+ refer to the docstring of this method for more information.
164
+ """
165
+ return self.tokenizer.batch_decode(*args, **kwargs)
166
+
167
+ def decode(self, *args, **kwargs):
168
+ """
169
+ This method forwards all its arguments to DimpleTokenizer's [`~PreTrainedTokenizer.decode`]. Please refer to
170
+ the docstring of this method for more information.
171
+ """
172
+ return self.tokenizer.decode(*args, **kwargs)
173
+
174
+ @property
175
+ def model_input_names(self):
176
+ tokenizer_input_names = self.tokenizer.model_input_names
177
+ image_processor_input_names = self.image_processor.model_input_names
178
+ return list(dict.fromkeys(tokenizer_input_names + image_processor_input_names))
processor_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoProcessor": "processing_dimple.DimpleProcessor"
4
+ },
5
+ "processor_class": "DimpleProcessor"
6
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|vision_start|>",
4
+ "<|vision_end|>",
5
+ "<|vision_pad|>",
6
+ "<|image_pad|>",
7
+ "<|video_pad|>"
8
+ ],
9
+ "bos_token": {
10
+ "content": "<|beginoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<|mask|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<|endoftext|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenization_dimple.py ADDED
@@ -0,0 +1,340 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The Dimple team and The HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on Qwen's implementations in this library.
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """Tokenization classes for Dimple."""
17
+
18
+ import json
19
+ import os
20
+ import unicodedata
21
+ from functools import lru_cache
22
+ from typing import Optional, Tuple
23
+
24
+ import regex as re
25
+
26
+ from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer
27
+ from transformers.utils import logging
28
+
29
+
30
+ logger = logging.get_logger("Dimple."+__name__)
31
+
32
+ VOCAB_FILES_NAMES = {
33
+ "vocab_file": "vocab.json",
34
+ "merges_file": "merges.txt",
35
+ }
36
+
37
+
38
+ MAX_MODEL_INPUT_SIZES = {"rp-yu/Dimple-v0-Base-7B": 32768}
39
+
40
+ PRETOKENIZE_REGEX = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
41
+
42
+
43
+ @lru_cache()
44
+ # Copied from transformers.models.gpt2.tokenization_gpt2.bytes_to_unicode
45
+ def bytes_to_unicode():
46
+ """
47
+ Returns list of utf-8 byte and a mapping to unicode strings. We specifically avoids mapping to whitespace/control
48
+ characters the bpe code barfs on.
49
+
50
+ The reversible bpe codes work on unicode strings. This means you need a large # of unicode characters in your vocab
51
+ if you want to avoid UNKs. When you're at something like a 10B token dataset you end up needing around 5K for
52
+ decent coverage. This is a significant percentage of your normal, say, 32K bpe vocab. To avoid that, we want lookup
53
+ tables between utf-8 bytes and unicode strings.
54
+ """
55
+ bs = (
56
+ list(range(ord("!"), ord("~") + 1)) + list(range(ord("¡"), ord("¬") + 1)) + list(range(ord("®"), ord("ÿ") + 1))
57
+ )
58
+ cs = bs[:]
59
+ n = 0
60
+ for b in range(2**8):
61
+ if b not in bs:
62
+ bs.append(b)
63
+ cs.append(2**8 + n)
64
+ n += 1
65
+ cs = [chr(n) for n in cs]
66
+ return dict(zip(bs, cs))
67
+
68
+
69
+ # Copied from transformers.models.gpt2.tokenization_gpt2.get_pairs
70
+ def get_pairs(word):
71
+ """
72
+ Return set of symbol pairs in a word.
73
+
74
+ Word is represented as tuple of symbols (symbols being variable-length strings).
75
+ """
76
+ pairs = set()
77
+ prev_char = word[0]
78
+ for char in word[1:]:
79
+ pairs.add((prev_char, char))
80
+ prev_char = char
81
+ return pairs
82
+
83
+
84
+ class DimpleTokenizer(PreTrainedTokenizer):
85
+ """
86
+ Construct a Dimple tokenizer. Based on byte-level Byte-Pair-Encoding.
87
+
88
+ Same with GPT2Tokenizer, this tokenizer has been trained to treat spaces like parts of the tokens so a word will
89
+ be encoded differently whether it is at the beginning of the sentence (without space) or not:
90
+
91
+ ```python
92
+ >>> from transformers import AutoTokenizer
93
+
94
+ >>> tokenizer = AutoTokenizer.from_pretrained("rp-yu/Dimple-v0-Base-7B", trust_remote_code=True)
95
+ >>> tokenizer("Hello world")["input_ids"]
96
+ [9707, 1879]
97
+
98
+ >>> tokenizer(" Hello world")["input_ids"]
99
+ [21927, 1879]
100
+ ```
101
+ This is expected.
102
+
103
+ You should not use GPT2Tokenizer instead, because of the different pretokenization rules.
104
+
105
+ This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
106
+ this superclass for more information regarding those methods.
107
+
108
+ Args:
109
+ vocab_file (`str`):
110
+ Path to the vocabulary file.
111
+ merges_file (`str`):
112
+ Path to the merges file.
113
+ errors (`str`, *optional*, defaults to `"replace"`):
114
+ Paradigm to follow when decoding bytes to UTF-8. See
115
+ [bytes.decode](https://docs.python.org/3/library/stdtypes.html#bytes.decode) for more information.
116
+ unk_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
117
+ The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
118
+ token instead.
119
+ bos_token (`str`, *optional*):
120
+ The beginning of sequence token. Not applicable for this tokenizer.
121
+ eos_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
122
+ The end of sequence token.
123
+ pad_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
124
+ The token used for padding, for example when batching sequences of different lengths.
125
+ clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`):
126
+ Whether or not the model should cleanup the spaces that were added when splitting the input text during the
127
+ tokenization process. Not applicable to this tokenizer, since tokenization does not add spaces.
128
+ split_special_tokens (`bool`, *optional*, defaults to `False`):
129
+ Whether or not the special tokens should be split during the tokenization process. The default behavior is
130
+ to not split special tokens. This means that if `<|endoftext|>` is the `eos_token`, then `tokenizer.tokenize("<|endoftext|>") =
131
+ ['<|endoftext|>`]. Otherwise, if `split_special_tokens=True`, then `tokenizer.tokenize("<|endoftext|>")` will be give `['<',
132
+ '|', 'endo', 'ft', 'ext', '|', '>']`. This argument is only supported for `slow` tokenizers for the moment.
133
+ """
134
+
135
+ vocab_files_names = VOCAB_FILES_NAMES
136
+ model_input_names = ["input_ids", "attention_mask"]
137
+
138
+ def __init__(
139
+ self,
140
+ vocab_file,
141
+ merges_file,
142
+ errors="replace",
143
+ unk_token="<|endoftext|>",
144
+ bos_token=None,
145
+ eos_token="<|endoftext|>",
146
+ pad_token="<|endoftext|>",
147
+ clean_up_tokenization_spaces=False,
148
+ split_special_tokens=False,
149
+ **kwargs,
150
+ ):
151
+ # Dimple vocab does not contain control tokens; added tokens need to be special
152
+ bos_token = (
153
+ AddedToken(bos_token, lstrip=False, rstrip=False, special=True, normalized=False)
154
+ if isinstance(bos_token, str)
155
+ else bos_token
156
+ )
157
+ eos_token = (
158
+ AddedToken(eos_token, lstrip=False, rstrip=False, special=True, normalized=False)
159
+ if isinstance(eos_token, str)
160
+ else eos_token
161
+ )
162
+ unk_token = (
163
+ AddedToken(unk_token, lstrip=False, rstrip=False, special=True, normalized=False)
164
+ if isinstance(unk_token, str)
165
+ else unk_token
166
+ )
167
+ pad_token = (
168
+ AddedToken(pad_token, lstrip=False, rstrip=False, special=True, normalized=False)
169
+ if isinstance(pad_token, str)
170
+ else pad_token
171
+ )
172
+
173
+ with open(vocab_file, encoding="utf-8") as vocab_handle:
174
+ self.encoder = json.load(vocab_handle)
175
+ self.decoder = {v: k for k, v in self.encoder.items()}
176
+ self.errors = errors # how to handle errors in decoding
177
+ self.byte_encoder = bytes_to_unicode()
178
+ self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
179
+ bpe_merges = []
180
+ with open(merges_file, encoding="utf-8") as merges_handle:
181
+ for i, line in enumerate(merges_handle):
182
+ line = line.strip()
183
+ if (i == 0 and line.startswith("#version:")) or not line:
184
+ continue
185
+ bpe_merges.append(tuple(line.split()))
186
+ self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges))))
187
+ # NOTE: the cache can grow without bound and will get really large for long running processes
188
+ # (esp. for texts of language that do not use space between word, e.g. Chinese); technically
189
+ # not a memory leak but appears as one.
190
+ # GPT2Tokenizer has the same problem, so let's be consistent.
191
+ self.cache = {}
192
+
193
+ self.pat = re.compile(PRETOKENIZE_REGEX)
194
+
195
+ if kwargs.get("add_prefix_space", False):
196
+ logger.warning_once(
197
+ f"{self.__class__.__name} does not support `add_prefix_space`, setting it to True has no effect."
198
+ )
199
+
200
+ super().__init__(
201
+ errors=errors,
202
+ bos_token=bos_token,
203
+ eos_token=eos_token,
204
+ pad_token=pad_token,
205
+ unk_token=unk_token,
206
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
207
+ split_special_tokens=split_special_tokens,
208
+ **kwargs,
209
+ )
210
+
211
+ @property
212
+ def vocab_size(self) -> int:
213
+ return len(self.encoder)
214
+
215
+ # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.get_vocab
216
+ def get_vocab(self):
217
+ return dict(self.encoder, **self.added_tokens_encoder)
218
+
219
+ # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.bpe
220
+ def bpe(self, token):
221
+ if token in self.cache:
222
+ return self.cache[token]
223
+ word = tuple(token)
224
+ pairs = get_pairs(word)
225
+
226
+ if not pairs:
227
+ return token
228
+
229
+ while True:
230
+ bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
231
+ if bigram not in self.bpe_ranks:
232
+ break
233
+ first, second = bigram
234
+ new_word = []
235
+ i = 0
236
+ while i < len(word):
237
+ try:
238
+ j = word.index(first, i)
239
+ except ValueError:
240
+ new_word.extend(word[i:])
241
+ break
242
+ else:
243
+ new_word.extend(word[i:j])
244
+ i = j
245
+
246
+ if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
247
+ new_word.append(first + second)
248
+ i += 2
249
+ else:
250
+ new_word.append(word[i])
251
+ i += 1
252
+ new_word = tuple(new_word)
253
+ word = new_word
254
+ if len(word) == 1:
255
+ break
256
+ else:
257
+ pairs = get_pairs(word)
258
+ word = " ".join(word)
259
+ self.cache[token] = word
260
+ return word
261
+
262
+ # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer._tokenize
263
+ def _tokenize(self, text):
264
+ """Tokenize a string."""
265
+ bpe_tokens = []
266
+ for token in re.findall(self.pat, text):
267
+ token = "".join(
268
+ self.byte_encoder[b] for b in token.encode("utf-8")
269
+ ) # Maps all our bytes to unicode strings, avoiding control tokens of the BPE (spaces in our case)
270
+ bpe_tokens.extend(bpe_token for bpe_token in self.bpe(token).split(" "))
271
+ return bpe_tokens
272
+
273
+ # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer._convert_token_to_id
274
+ def _convert_token_to_id(self, token):
275
+ """Converts a token (str) in an id using the vocab."""
276
+ return self.encoder.get(token, self.encoder.get(self.unk_token))
277
+
278
+ # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer._convert_id_to_token
279
+ def _convert_id_to_token(self, index):
280
+ """Converts an index (integer) in a token (str) using the vocab."""
281
+ return self.decoder.get(index)
282
+
283
+ # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.convert_tokens_to_string
284
+ def convert_tokens_to_string(self, tokens):
285
+ """Converts a sequence of tokens (string) in a single string."""
286
+ text = "".join(tokens)
287
+ text = bytearray([self.byte_decoder[c] for c in text]).decode("utf-8", errors=self.errors)
288
+ return text
289
+
290
+ def decode(
291
+ self,
292
+ token_ids,
293
+ skip_special_tokens: bool = False,
294
+ clean_up_tokenization_spaces: Optional[bool] = False,
295
+ spaces_between_special_tokens: bool = False,
296
+ **kwargs,
297
+ ) -> str:
298
+ # `spaces_between_special_tokens` defaults to True for _decode in slow tokenizers
299
+ # and cannot be configured elsewhere, but it should default to False for DiM15Tokenizer
300
+ return super().decode(
301
+ token_ids,
302
+ skip_special_tokens=skip_special_tokens,
303
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
304
+ spaces_between_special_tokens=spaces_between_special_tokens,
305
+ **kwargs,
306
+ )
307
+
308
+ # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.save_vocabulary
309
+ def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
310
+ if not os.path.isdir(save_directory):
311
+ logger.error(f"Vocabulary path ({save_directory}) should be a directory")
312
+ return
313
+ vocab_file = os.path.join(
314
+ save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
315
+ )
316
+ merge_file = os.path.join(
317
+ save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["merges_file"]
318
+ )
319
+
320
+ with open(vocab_file, "w", encoding="utf-8") as f:
321
+ f.write(json.dumps(self.encoder, indent=2, sort_keys=True, ensure_ascii=False) + "\n")
322
+
323
+ index = 0
324
+ with open(merge_file, "w", encoding="utf-8") as writer:
325
+ writer.write("#version: 0.2\n")
326
+ for bpe_tokens, token_index in sorted(self.bpe_ranks.items(), key=lambda kv: kv[1]):
327
+ if index != token_index:
328
+ logger.warning(
329
+ f"Saving vocabulary to {merge_file}: BPE merge indices are not consecutive."
330
+ " Please check that the tokenizer is not corrupted!"
331
+ )
332
+ index = token_index
333
+ writer.write(" ".join(bpe_tokens) + "\n")
334
+ index += 1
335
+
336
+ return vocab_file, merge_file
337
+
338
+ def prepare_for_tokenization(self, text, **kwargs):
339
+ text = unicodedata.normalize("NFC", text)
340
+ return (text, kwargs)
tokenizer_config.json ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<|beginoftext|>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": true
188
+ },
189
+ "151666": {
190
+ "content": "<|mask|>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": true
196
+ }
197
+ },
198
+ "additional_special_tokens": [
199
+ "<|vision_start|>",
200
+ "<|vision_end|>",
201
+ "<|vision_pad|>",
202
+ "<|image_pad|>",
203
+ "<|video_pad|>"
204
+ ],
205
+ "auto_map": {
206
+ "AutoProcessor": "processing_dimple.DimpleProcessor",
207
+ "AutoTokenizer": [
208
+ "tokenization_dimple.DimpleTokenizer",
209
+ null
210
+ ]
211
+ },
212
+ "bos_token": "<|beginoftext|>",
213
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}",
214
+ "clean_up_tokenization_spaces": false,
215
+ "eos_token": "<|endoftext|>",
216
+ "errors": "replace",
217
+ "mask_token": "<|mask|>",
218
+ "model_max_length": 131072,
219
+ "pad_token": "<|endoftext|>",
220
+ "padding_side": "left",
221
+ "processor_class": "DimpleProcessor",
222
+ "split_special_tokens": false,
223
+ "tokenizer_class": "DimpleTokenizer",
224
+ "unk_token": null
225
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff