--- library_name: transformers tags: - gemma3 - instruct - mamaylm - insait license: gemma language: - uk - en base_model: - google/gemma-3-12b-it - google/gemma-3-12b-pt pipeline_tag: image-text-to-text datasets: - Goader/kobza - HuggingFaceFW/fineweb-2 - HPLT/HPLT2.0_cleaned - wikimedia/wikipedia - HuggingFaceTB/smoltalk2 - open-r1/Mixture-of-Thoughts --- # INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0 ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F637e1f8cf7e01589cc17bf7e%2Fp6d0YFHjWCQ3S12jWqO1m.png) INSAIT introduces **MamayLM-Gemma-3-12B-IT-v1.0**, the best performing Ukrainian language model based on **google/gemma-3-12b** and **google/gemma-3-12b-it**. MamayLM-Gemma-3-12B-IT-v1.0 is **free to use** and distributed under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). This model was created by [`INSAIT`](https://insait.ai/), part of Sofia University St. Kliment Ohridski, in Sofia, Bulgaria. # Model description The model was built on top of Google’s Gemma 3 12B open models. It was continuously pre-trained on a large pre-filtered dataset using the combination of data mixing and model merging, allowing the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance. During the pre-training stage, we use various datasets, including Ukrainian web crawl data (Kobza), freely available datasets such as Wikipedia, a range of specialized Ukrainian datasets, and machine translations of popular English datasets. The model was then instruction-fine-tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets, prepared by Ukrainian community. For more information check our [blogpost](http://blog.mamaylm.insait.ai) (available in English and Ukrainian). # Benchmarks and Results ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F650ed7adf141bc34f91a12ae%2FviINoBT15cG5AxU5xFPgz.png) ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F650ed7adf141bc34f91a12ae%2FGvCFFl2NQVxxnRa3pjD2V.png) We evaluate our models on a set of standard English benchmarks, a translated version of them in Ukrainian, as well as, Ukrainian specific benchmarks we collected: - **Winogrande challenge**: testing world knowledge and understanding - **Hellaswag**: testing sentence completion - **ARC Easy/Challenge**: testing logical reasoning - **TriviaQA**: testing trivia knowledge - **GSM-8k**: solving multiple-choice questions in high-school mathematics - **MMLU**: testing knowledge on a multitude of topics - **IFEval**: testing instruction-following skills - **ZNO**: testing knowledge of the Ukrainian high school curriculum in Ukrainian language & literature, history, mathematics and geography These benchmarks test logical reasoning, mathematics, knowledge, language understanding and other skills of the models and are provided at https://github.com/insait-institute/lm-evaluation-harness-uk. The graphs above show the performance of MamayLM 12B compared to other large open models. The results show the excellent abilities of MamayLM in Ukrainian, which allow them to **outperform much larger models**, including Alibaba’s Qwen 2.5 72B and Meta’s Llama3.1 70B. Finally, our models retain the **excellent English performance** inherited from the original Google Gemma 3 models upon which they are based. ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F650ed7adf141bc34f91a12ae%2FZc6vtA12ohuX5_S8ETN8Q.png) MamayLM v1.0 12B also shows improved performance on visual benchmarks like MMMU and ZNO-Vision(MMZNO): ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F650ed7adf141bc34f91a12ae%2FW0MQUv6OSnEDMCVAD7kLy.png) ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F650ed7adf141bc34f91a12ae%2FweS08Z8wdbb3mkm3pB75z.png) # Use in 🤗 Transformers First install the latest version of the transformers library: ``` pip install -U 'transformers[torch]' ``` Then load the model in transformers: ```python from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map="auto", ) ``` # Recommended Parameters For optimal performance, we recommend the following parameters for text generation, as we have extensively tested our model with them: ```python from transformers import GenerationConfig generation_params = GenerationConfig( max_new_tokens=2048, # Choose maximum generation tokens temperature=0.1, top_k=25, top_p=1, repetition_penalty=1.1, # eos_token_id=[1,106], do_sample=True ) ``` In principle, increasing temperature should work adequately as well. # Instruction format In order to leverage instruction fine-tuning, your prompt should begin with a beginning-of-sequence token `` and be formatted in the Gemma 3 chat template. `` should only be the first token in a chat sequence. E.g. ``` user Хто такий Козак Мамай? model ``` This format is also available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method: ```python tokenizer = AutoTokenizer.from_pretrained( "INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0", use_default_system_prompt=False, ) messages = [ {"role": "user", "content": "Хто такий Козак Мамай?"}, ] input_ids = tokenizer.apply_chat_template( messages, return_tensors="pt", add_generation_prompt=True, return_dict=True ) outputs = model.generate( **input_ids, generation_config=generation_params ) print(tokenizer.decode(outputs[0])) ``` # Use with vLLM Example usage with vLLM: ```python from vllm import LLM, SamplingParams from vllm.inputs import TokensPrompt from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( "INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0", use_default_system_prompt=False, ) sampling_params = SamplingParams( max_tokens=2048, temperature=0.1, top_k=25, top_p=1, repetition_penalty=1.1, stop_token_ids=[1, 106], ) llm = LLM( model="INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0", dtype="bfloat16", # enforce_eager=True ) messages = [ {"role": "user", "content": "Хто такий Козак Мамай?"}, ] formatted_prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) input_ids = tokenizer( formatted_prompt, add_special_tokens=False ).input_ids prompt = TokensPrompt(prompt_token_ids=input_ids) output = llm.generate( prompt, sampling_params ) generated_text = output[0].outputs[0].text print(generated_text) ``` # Use with GGML / llama.cpp The model and instructions for usage in GGUF format are available at [INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0-GGUF](https://huggingface.co/INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0-GGUF). # Community Feedback We welcome feedback from the community to help improve MamayLM. If you have suggestions, encounter any issues, or have ideas for improvements, please: - Share your experience using the model through Hugging Face's community discussion feature or - Contact us at [contact@insait.ai](mailto:contact@insait.ai) Your real-world usage and insights are valuable in helping us optimize the model's performance and behaviour for various use cases. # Summary - **Finetuned from:** [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it); [google/gemma-3-12b-pt](https://huggingface.co/google/gemma-3-12b-pt); - **Model type:** Causal decoder-only transformer language model - **Language:** Ukrainian and English - **Contact:** [contact@insait.ai](mailto:contact@insait.ai) - **License:** MamayLM is distributed under [Gemma Terms of Use](https://huggingface.co/INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0/raw/main/LICENSE)