Blenderbot¶
DISCLAIMER: If you see something strange, file a Github Issue .
Overview¶
The Blender chatbot model was proposed in Recipes for building an open-domain chatbot Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
The abstract of the paper is the following:
Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.
The authors’ code can be found here .
Implementation Notes¶
- Blenderbot uses a standard seq2seq model transformer based architecture. 
- It inherits completely from - BartForConditionalGeneration
- Even though blenderbot is one model, it uses two tokenizers - BlenderbotSmallTokenizerfor 90M checkpoint and- BlenderbotTokenizerfor all other checkpoints.
- BlenderbotSmallTokenizerwill always return- BlenderbotSmallTokenizer, regardless of checkpoint. To use the 3B parameter checkpoint, you must call- BlenderbotTokenizerdirectly.
- Available checkpoints can be found in the model hub. 
Usage¶
Here is an example of model usage:
>>> from transformers import BlenderbotSmallTokenizer, BlenderbotForConditionalGeneration
>>> mname = 'facebook/blenderbot-90M'
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = BlenderbotSmallTokenizer.from_pretrained(mname)
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([UTTERANCE], return_tensors='pt')
>>> reply_ids = model.generate(**inputs)
>>> print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in reply_ids])
Here is how you can check out config values:
>>> from transformers import BlenderbotConfig
>>> config_90 = BlenderbotConfig.from_pretrained("facebook/blenderbot-90M")
>>> config_90.to_diff_dict()  # show interesting Values.
>>> configuration_3B = BlenderbotConfig("facebook/blenderbot-3B")
>>> configuration_3B.to_diff_dict()
BlenderbotConfig¶
- 
class transformers.BlenderbotConfig(activation_dropout=0.0, extra_pos_embeddings=0, activation_function='gelu', vocab_size=54944, d_model=512, encoder_ffn_dim=2048, encoder_layers=8, encoder_attention_heads=16, decoder_ffn_dim=2048, decoder_layers=8, decoder_attention_heads=16, encoder_layerdrop=0.0, decoder_layerdrop=0.0, attention_dropout=0.0, dropout=0.1, max_position_embeddings=512, classifier_dropout=0.0, is_encoder_decoder=True, pad_token_id=1, bos_token_id=0, eos_token_id=2, normalize_before=False, add_final_layer_norm=False, do_blenderbot_90_layernorm=True, scale_embedding=False, normalize_embedding=True, static_position_embeddings=False, add_bias_logits=False, force_bos_token_to_be_generated=False, **common_kwargs)[source]¶
- This is the configuration class to store the configuration of a - BlenderbotForConditionalGeneration. It inherits from- BartConfigand has the same signature with different defaults.- Configuration objects inherit from - PretrainedConfigand can be used to control the model outputs. Read the documentation from- PretrainedConfigfor more information.- Parameters
- vocab_size ( - int, optional, defaults to 54944) – Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the- inputs_idspassed when calling- BlenderbotForConditionalGeneration.
- d_model ( - int, optional, defaults to 512) – Dimensionality of the layers and the pooler layer.
- encoder_layers ( - int, optional, defaults to 8) – Number of encoder layers, 6 are used for the blenderbot-90M model.
- decoder_layers ( - int, optional, defaults to 8) – Number of decoder layers, 6 are used for the blenderbot-90M model.
- encoder_attention_heads ( - int, optional, defaults to 16) – Number of attention heads for each attention layer in the Transformer encoder.
- decoder_attention_heads ( - int, optional, defaults to 16) – Number of attention heads for each attention layer in the Transformer decoder.
- decoder_ffn_dim ( - int, optional, defaults to 2048) – Dimensionality of the “intermediate” (often named feed-forward) layer in decoder.
- encoder_ffn_dim ( - int, optional, defaults to 2048) – Dimensionality of the “intermediate” (often named feed-forward) layer in decoder.
- activation_function ( - stror- function, optional, defaults to- "gelu") – The non-linear activation function (function or string) in the encoder and pooler. If string,- "gelu",- "relu",- "silu"and- "gelu_new"are supported.
- dropout ( - float, optional, defaults to 0.1) – The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
- attention_dropout ( - float, optional, defaults to 0.0) – The dropout ratio for the attention probabilities.
- activation_dropout ( - float, optional, defaults to 0.0) – The dropout ratio for activations inside the fully connected layer.
- classifier_dropout ( - float, optional, defaults to 0.0) – The dropout ratio for classifier.
- max_position_embeddings ( - int, optional, defaults to 512) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
- init_std ( - float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
- add_bias_logits ( - bool, optional, defaults to- False) – This should be completed, specific to marian.
- normalize_before ( - bool, optional, defaults to- False) – Call layernorm before attention ops.
- normalize_embedding ( - bool, optional, defaults to- True) – Call layernorm after embeddings.
- static_position_embeddings ( - bool, optional, defaults to- False) – Don’t learn positional embeddings, use sinusoidal.
- add_final_layer_norm ( - bool, optional, defaults to- False) – Why not add another layernorm?
- do_blenderbot_90_layernorm ( - bool, optional, defaults to- True) – Blenderbot-90m checkpoint uses layernorm_embedding one line earlier in the decoder.
- scale_embedding ( - bool, optional, defaults to- False) – Scale embeddings by diving by sqrt(d_model).
- eos_token_id ( - int, optional, defaults to 2) – End of stream token id.
- pad_token_id ( - int, optional, defaults to 1) – Padding token id.
- bos_token_id ( - int, optional, defaults to 0) – Beginning of stream token id.
- encoder_layerdrop – ( - float, optional, defaults to 0.0): The LayerDrop probability for the encoder. See the LayerDrop paper for more details.
- decoder_layerdrop – ( - float, optional, defaults to 0.0): The LayerDrop probability for the decoder. See the LayerDrop paper for more details.
- extra_pos_embeddings – ( - int, optional, defaults to 2): How many extra learned positional embeddings to use. Should be set to- pad_token_id+1.
- is_encoder_decoder ( - bool, optional, defaults to- True) – Whether this is an encoder/decoder model.
- force_bos_token_to_be_generated ( - bool, optional, defaults to- False) – Whether or not to force BOS token to be generated at step 1 (after- decoder_start_token_id),
 
 
BlenderbotTokenizer¶
- 
class transformers.BlenderbotTokenizer(vocab_file, merges_file, errors='replace', bos_token='<s>', eos_token='</s>', sep_token='</s>', cls_token='<s>', unk_token='<unk>', pad_token='<pad>', mask_token='<mask>', add_prefix_space=False, **kwargs)[source]¶
- Construct a Blenderbot tokenizer. - Blenderbotis nearly identical to- RobertaTokenizerand runs end-to-end tokenization: punctuation splitting and wordpiece. The only difference is that it doesnt add BOS token to the beginning of sequences.- Refer to superclass - RobertaTokenizerfor usage examples and documentation concerning parameters.- 
build_inputs_with_special_tokens(token_ids_0: List[int], token_ids_1: List[int] = None)[source]¶
- Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A Blenderbot sequence has the following format: - single sequence: `` X </s>`` 
 - Parameters
- token_ids_0 ( - List[int]) – List of IDs to which the special tokens will be added
- token_ids_1 ( - List[int], optional) – Will be ignored
 
- Returns
- list of input IDs with the appropriate special tokens. 
- Return type
- List[int]
 
 
- 
BlenderbotSmallTokenizer¶
- 
class transformers.BlenderbotSmallTokenizer(vocab_file, merges_file, bos_token='__start__', eos_token='__end__', unk_token='__unk__', pad_token='__null__', **kwargs)[source]¶
- Constructs a Blenderbot-90M tokenizer based on BPE (Byte-Pair-Encoding) - This tokenizer inherits from - PreTrainedTokenizerwhich contains most of the main methods. Users should refer to the superclass for more information regarding methods.- Parameters
- vocab_file ( - str) – File containing the vocabulary.
- merges_file ( - str) – Path to the merges file.
- bos_token ( - str, optional, defaults to- "__start__") – The beginning of sentence token.
- eos_token ( - str, optional, defaults to- "__end__") – The end of sentence token.
- unk_token ( - str, optional, defaults to- "__unk__") – The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.
- pad_token ( - str, optional, defaults to- "__pad__") – The token used for padding, for example when batching sequences of different lengths.
- **kwargs – Additional keyword arguments passed along to - PreTrainedTokenizer
 
 - 
convert_tokens_to_string(tokens: List[str]) → str[source]¶
- Converts a sequence of tokens in a single string. 
 - 
get_vocab() → Dict[source]¶
- Returns the vocabulary as a dictionary of token to index. - tokenizer.get_vocab()[token]is equivalent to- tokenizer.convert_tokens_to_ids(token)when- tokenis in the vocab.- Returns
- The vocabulary. 
- Return type
- Dict[str, int]
 
 - 
save_vocabulary(save_directory: str, filename_prefix: Optional[str] = None) → Tuple[str][source]¶
- Save only the vocabulary of the tokenizer (vocabulary + added tokens). - This method won’t save the configuration and special token mappings of the tokenizer. Use - _save_pretrained()to save the whole state of the tokenizer.- Parameters
- save_directory ( - str) – The directory in which to save the vocabulary.
- filename_prefix ( - str, optional) – An optional prefix to add to the named of the saved files.
 
- Returns
- Paths to the files saved. 
- Return type
- Tuple(str)
 
 - 
property vocab_size¶
- Size of the base vocabulary (without the added tokens). - Type
- int
 
 
BlenderbotForConditionalGeneration¶
See transformers.BartForConditionalGeneration for arguments to forward and generate
- 
class transformers.BlenderbotForConditionalGeneration(config: transformers.models.bart.configuration_bart.BartConfig)[source]¶
- The BART Model with a language modeling head. Can be used for summarization. - This model inherits from - PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)- This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior. - This class overrides - BartForConditionalGeneration. Please check the superclass for the appropriate documentation alongside usage examples.- 
adjust_logits_during_generation(logits, cur_len, max_length)[source]¶
- Implement in subclasses of - PreTrainedModelfor custom behavior to adjust the logits in the generate method.
 - 
config_class¶
- alias of - transformers.models.blenderbot.configuration_blenderbot.BlenderbotConfig
 
- 
TFBlenderbotForConditionalGeneration¶
See transformers.TFBartForConditionalGeneration for arguments to forward and generate
- 
class transformers.TFBlenderbotForConditionalGeneration(*args, **kwargs)[source]¶
- Blenderbot model for open domain dialogue This model inherits from - TFBartForConditionalGeneration. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)- This model is also a tf.keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior. - Note - TF 2.0 models accepts two formats as inputs: - having all inputs as keyword arguments (like PyTorch models), or 
- having all inputs as a list, tuple or dict in the first positional arguments. 
 - This second option is useful when using - tf.keras.Model.fit()method which currently requires having all the tensors in the first argument of the model call function:- model(inputs).- If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument : - a single Tensor with - input_idsonly and nothing else:- model(input_ids)
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: - model([input_ids, attention_mask])or- model([input_ids, attention_mask, token_type_ids])
- a dictionary with one or several input Tensors associated to the input names given in the docstring: - model({"input_ids": input_ids, "token_type_ids": token_type_ids})
 - Parameters
- config ( - BlenderbotConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the- from_pretrained()method to load the model weights.
 - 
adjust_logits_during_generation(logits, cur_len, max_length)[source]¶
- Never predict pad_token_id. Predict </s> when max_length is reached. 
 - 
config_class¶
- alias of - transformers.models.blenderbot.configuration_blenderbot.BlenderbotConfig