Mangosteen, a 47 billion-token Thai corpus built with a Thai-adapted pipeline, improves language model performance on Thai benchmarks.
Wannaphong Phatthiyaphaibun PRO
wannaphong
AI & ML interests
None yet
Recent Activity
liked
a dataset
2 days ago
opendatalab/WanJuanSiLu-Multimodal-5Languages
Organizations
models
57
wannaphong/all-base-model
Updated
wannaphong/thaidolma-cc_quality_dedup
Updated
wannaphong/thaidolma-cc_og
Updated
wannaphong/SmolLM3-3B-Base-Thai-v0.1
Text Generation
•
3B
•
Updated
•
18
wannaphong/cat-tokenizer
Updated
wannaphong/mangosteen-gpt2-lab
Updated
wannaphong/mangosteen-cpt-checkpoint
Updated
wannaphong/thai-dolma-fasttext-model
Updated
wannaphong/typhoon2-qwen2.5-7b-instruct-1M
Text Generation
•
8B
•
Updated
wannaphong/Roman2Thai-transliterator
Translation
•
77.5M
•
Updated
•
6
datasets
67
wannaphong/ChoronoCall-Q-AI_Builders-2025
Viewer
•
Updated
•
2.04k
•
10
wannaphong/thwiki-20251001-cleaned
Viewer
•
Updated
•
154k
•
20
wannaphong/simplewiki-20251001-cleaned
Viewer
•
Updated
•
212k
•
19
wannaphong/thwikiquote-20251001-cleaned
Viewer
•
Updated
•
704
•
22
wannaphong/thwikibooks-20251001-cleaned
Viewer
•
Updated
•
1.08k
•
34
wannaphong/thwikisource-20251001-cleaned
Viewer
•
Updated
•
4.03k
•
13
wannaphong/prachathai67k
Viewer
•
Updated
•
67.9k
•
27
wannaphong/simplewiki-20250920-cleaned
Viewer
•
Updated
•
212k
•
36
wannaphong/thwiki-20250920-cleaned
Viewer
•
Updated
•
153k
•
64
wannaphong/thwikiquote-20250920-cleaned
Viewer
•
Updated
•
665
•
59