--- library_name: transformers license: apache-2.0 base_model: kakaocorp/kanana-1.5-2.1b-instruct-2505 tags: - axolotl - generated_from_trainer datasets: - train.jsonl model-index: - name: fc-proj1-test01 results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.10.0` ```yaml # base_model: mistralai/Mistral-Nemo-Base-2407 base_model: kakaocorp/kanana-1.5-2.1b-instruct-2505 # Enable to use mistral-common tokenizer # tokenizer_use_mistral_common: true # Automatically upload checkpoint and final model to HF # hub_model_id: username/custom_model_name load_in_8bit: false load_in_4bit: false # datasets: # - path: fozziethebeat/alpaca_messages_2k_test # type: chat_template datasets: - path: train.jsonl type: chat_template dataset_prepared_path: preprocess val_set_size: 0.01 output_dir: ./outputs dataloader_num_workers: 56 adapter: # adapter: lora lora_model_dir: # lora_r: 32 # lora_alpha: 16 # lora_dropout: 0.05 # lora_target_linear: true # lora_target_modules: # - gate_proj # - down_proj # - up_proj # - q_proj # - v_proj # - k_proj # - o_proj # lora_mlp_kernel: true # lora_qkv_kernel: true # lora_o_kernel: true sequence_len: 8192 sample_packing: false eval_sample_packing: false pad_to_sequence_len: false plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_swiglu: true liger_fused_linear_cross_entropy: true wandb_project: fastcampus wandb_entity: wandb_watch: wandb_name: fc-proj1-test01 wandb_log_model: hub_model_id: amphora/fc-proj1-test01 gradient_accumulation_steps: 4 micro_batch_size: 16 num_epochs: 3 optimizer: adamw_torch_fused # optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 2e-5 bf16: auto tf32: false # torch_compile: auto # torch_compile_backend: inductor gradient_checkpointing: resume_from_checkpoint: logging_steps: 1 flash_attention: true # flash_attn_rms_norm: true # flash_attn_cross_entropy: true # flash_attn_fuse_qkv: true flash_attn_fuse_mlp: true warmup_ratio: 0.05 # warmup_steps: 10 weight_decay: 0.01 evals_per_epoch: 0 saves_per_epoch: 1 # deepspeed: deepspeed_configs/zero3_bf16.json # fsdp: # # - shard_grad_ops # - full_shard # - auto_wrap # fsdp_config: # fsdp_state_dict_type: FULL_STATE_DICT # fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer # fsdp_activation_checkpointing: true fsdp: # - shard_grad_ops - full_shard - auto_wrap fsdp_config: fsdp_backward_prefetch: BACKWARD_PRE fsdp_state_dict_type: SHARDED_STATE_DICT fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer fsdp_activation_checkpointing: true ```

# fc-proj1-test01 This model is a fine-tuned version of [kakaocorp/kanana-1.5-2.1b-instruct-2505](https://huggingface.co/kakaocorp/kanana-1.5-2.1b-instruct-2505) on the train.jsonl dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 4 - total_train_batch_size: 128 - total_eval_batch_size: 32 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 43 - training_steps: 860 ### Training results ### Framework versions - Transformers 4.52.3 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.2