Nemo base model pretrained on 2billion of 4billion tokens. Intended for additional conversational/instruct tuning.
Eval Loss: 1.95439 -> 1.92584
Stopped this run at 12k steps out of about 21k. Main issues found were with DCLM dataset which is too low quality to use for such a small training job. I'll go back to this with higher quality data.
- Downloads last month
- -
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support
