Mangosteen, a 47 billion-token Thai corpus built with a Thai-adapted pipeline, improves language model performance on Thai benchmarks.
Wannaphong Phatthiyaphaibun PRO
wannaphong
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 20 hours ago
wannaphong/thai-text-classification-prompt
published
a dataset
about 20 hours ago
wannaphong/thai-text-classification-prompt
liked
a dataset
5 days ago
opendatalab/WanJuanSiLu-Multimodal-5Languages