--- license: apache-2.0 language: - tr --- # Bert2DModel [Bert2DModel](https://ieeexplore.ieee.org/document/10542953) is a new take on the classic BERT architecture, built specifically for languages that have really complex word structures, like Turkish. Think of it this way: regular BERT sees a sentence as a flat line of words. But for some languages, words themselves have a lot of internal structure (prefixes, suffixes, etc.). Bert2D is cool because it uses a "2D embedding" system. It not only looks at a word's position in the sentence (the first dimension) but also at the position of the sub-pieces inside that word (the second dimension). This gives it a much deeper understanding of the grammar and meaning, especially when words can change form in many different ways. This first version is trained for Turkish\! You can find all the original [Bert2DModel] checkpoints under the [yigitbekir](https://huggingface.co/yigitbekir) collection. > [\!TIP] > Click on the [Bert2DModel] models in the right sidebar for more examples of how to apply [Bert2DModel] to different text and token classification tasks. The example below demonstrates how to use the `fill-mask` pipeline with `Bert2DModel` or load it directly with the [`AutoModel`] class. ```python from transformers import pipeline, AutoTokenizer, AutoModelForMaskedLM # 1. Define your model repository ID repo_id = "yigitbekir/Bert2D-cased-Turkish-128K-WWM-NSW2" # 2. Explicitly load the tokenizer and model first. # This is the robust way that you have confirmed works. print("Loading tokenizer...") tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) print("Loading model...") model = AutoModelForMaskedLM.from_pretrained(repo_id, trust_remote_code=True) # 3. Pass the fully initialized OBJECTS directly to the pipeline. # This removes all guesswork for the pipeline function. print("Initializing pipeline...") fill_masker = pipeline( "fill-mask", model=model, tokenizer=tokenizer ) # 4. Now, this will work correctly print("Predicting mask...") masked_sentence = "Adamın mesleği [MASK] midir acaba?" predictions = fill_masker(masked_sentence) # 5. Print the results print("\n--- Predictions ---") for prediction in predictions: print(f" Sequence: {prediction['sequence']}") print(f" Token: {prediction['token_str']}") print(f" Score: {prediction['score']:.4f}") print("-" * 20) # Expected output: # Sequence: Adamın mesleği mühendis midir acaba? # Score: 0.2393 # -------------------- # Sequence: Adamın mesleği doktor midir acaba? # Score: 0.1698 # -------------------- ``` ```python from transformers import AutoTokenizer, Bert2DModel # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained("yigitbekir/Bert2D-cased-Turkish-128K-WWM-NSW2") model = Bert2DModel.from_pretrained("yigitbekir/Bert2D-cased-Turkish-128K-WWM-NSW2") # Example text text = "Türkiye'nin başkenti Ankara'dır." inputs = tokenizer(text, return_tensors="pt") # Get model outputs outputs = model(**inputs) last_hidden_states = outputs.last_hidden_state ``` ```bash transformers-cli pipeline fill-mask "yigitbekir/Bert2D-cased-Turkish-128K-WWM-NSW2" --top_k 5 <<< "Adamın mesleği [MASK] midir acaba?" ``` ## Notes - **Configuration is Key:** `Bert2D` introduces new configuration parameters that are not present in a standard BERT model. You must use the `Bert2DConfig` and be mindful of these settings when training or fine-tuning. Failing to do so will lead to unexpected behavior. The two key new parameters are `max_word_position_embeddings` and `max_intermediate_subword_position_embeddings`. ```py from transformers import AutoConfig # Load the custom config from a pretrained model config = AutoConfig.from_pretrained("yigitbekir/Bert2D-cased-Turkish-128K-WWM-NSW2") # Access new parameters print(f"Max Word Positions: {config.max_word_position_embeddings}") # Expected output: Max Word Positions: 512 print(f"Intermediate Subword Position: {config.max_intermediate_subword_position_embeddings}") # Expected output: Intermediate Subword Position: 2 ```