NanoT5 Base Malaysian Translation v2.1

Finetuned https://huggingface.co/mesolitica/nanot5-base-malaysian-cased using 2048 context length on 9B tokens of translation dataset.

  • This model able to translate from localize text into standard text.
  • This model able to reverse translate from standard to localize text, suitable for text augmentation.
  • This model able to translate code.
  • This model natively code switching.
  • This model should maintain \n, \t, \r as it is.
  • Better Science and Math context translation compared to v2.
  • Better Manglish translation compared to v2.
  • Better Cantonese translation compared to v2.
  • Better Tamil and Tanglish translation compared to v2.

Wandb at https://wandb.ai/huseinzol05/nanot5-base-malaysian-cased-translation-v6-multipack-post

how to

You can follow the same how to from the small model README, https://huggingface.co/mesolitica/nanot5-small-malaysian-translation-v2.1#how-to

Downloads last month
286
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mesolitica/nanot5-base-malaysian-translation-v2.1

Finetuned
(4)
this model

Dataset used to train mesolitica/nanot5-base-malaysian-translation-v2.1

Collection including mesolitica/nanot5-base-malaysian-translation-v2.1