--- license: apache-2.0 base_model: Qwen/Qwen-1_8B-Chat tags: - qwen - lora - text-generation - fine-tuned - hemiplegia - stroke - medical-qa - tiansuan-ai pipeline_tag: text-generation widget: # Example of how the widget could be structured if it were a full model # For LoRA, direct widget use is complex, but this metadata is good practice. - example_title: "Hemiplegia Rehab Question" text: "<|im_start|>system\n你是一个专注于偏瘫、脑血栓、半身不遂领域的医疗问答助手。<|im_end|>\n<|im_start|>user\n偏瘫患者的早期康复锻炼有哪些?<|im_end|>\n<|im_start|>assistant\n" --- # Qwen-1.8B-Chat LoRA for Hemiplegia/Stroke Q&A (Associated with Tiansuan AI) This repository contains LoRA (Low-Rank Adaptation) weights for the `Qwen/Qwen-1_8B-Chat` model. This model was fine-tuned on a small, custom dataset to answer questions related to hemiplegia, cerebral thrombosis (stroke), and related conditions. This fine-tuning experiment is associated with work at Tiansuan AI. ## Model Description This is a LoRA adapter. To use it, you need to load the base model `Qwen/Qwen-1_8B-Chat` and then apply these LoRA weights using the PEFT library. ## Fine-tuning Data The model was fine-tuned on a very small, custom dataset consisting of 5 question-answer pairs specifically designed for the medical domain of hemiplegia and stroke. The training process involved 20 steps. Due to the extremely limited dataset size and training duration, the model's capabilities are primarily for demonstration of the fine-tuning process. It will likely exhibit strong memorization of the training data and limited generalization to unseen questions. ## Intended Use This model is intended for research, educational, and illustrative purposes to demonstrate the LoRA fine-tuning technique for LLMs on specialized, albeit small, datasets. **Crucially, this model is NOT a substitute for professional medical advice, diagnosis, or treatment.** Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition. The outputs of this model should be critically evaluated and not used for any real-world medical decision-making. ## How to Use with PEFT ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch # Define model IDs base_model_id = "Qwen/Qwen-1_8B-Chat" lora_adapter_id = "jinv2/qwen-1_8b-hemiplegia-lora" # This is your model # Setup quantization configuration (as used during fine-tuning) quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16 # Matches the successful fine-tuning compute dtype ) # Load the base model with quantization print(f"Loading base model: {base_model_id}...") base_model = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=quantization_config, trust_remote_code=True, device_map="auto" # Automatically distribute model on available anjing (GPU if available, else CPU) ) print("Base model loaded.") # Load the tokenizer # It's good practice to load tokenizer from the same source as the fine-tuned adapter if uploaded, # or ensure base tokenizer settings (pad_token, etc.) are consistent. print(f"Loading tokenizer from: {lora_adapter_id} (or fallback to {base_model_id})...") try: tokenizer = AutoTokenizer.from_pretrained(lora_adapter_id, trust_remote_code=True) print(f"Successfully loaded tokenizer from {lora_adapter_id}.") except Exception: print(f"Could not load tokenizer from {lora_adapter_id}, falling back to {base_model_id}.") tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True) # Set pad_token if not already set (important for Qwen and generation) if tokenizer.pad_token_id is None: if tokenizer.eos_token_id is not None: tokenizer.pad_token_id = tokenizer.eos_token_id print(f"Set tokenizer.pad_token_id to eos_token_id: {tokenizer.pad_token_id}") else: # Fallback if eos_token_id is also None (should not happen for Qwen) # For Qwen, eos_token_id is typically around 151643 for <|endoftext|> # tokenizer.pad_token_id = 151643 # Example, verify Qwen's actual eos_token_id print("Warning: pad_token_id and eos_token_id are None. Generation might be problematic.") tokenizer.padding_side = "left" # Usually preferred for generation # Load the LoRA adapter onto the base model print(f"Loading LoRA adapter: {lora_adapter_id}...") model = PeftModel.from_pretrained(base_model, lora_adapter_id) model.eval() # Set the model to evaluation mode print("LoRA adapter loaded and model is ready for inference.") # --- Inference Example --- # Since tokenizer.chat_template was 'Not Available' during Colab run, # we manually construct the prompt according to Qwen's ChatML format. system_prompt_content = "你是一个专注于偏瘫、脑血栓、半身不遂领域的医疗问答助手。" user_query_content = "偏瘫患者的早期康复锻炼有哪些?" # A question from your training set prompt = f"<|im_start|>system\n{system_prompt_content}<|im_end|>\n<|im_start|>user\n{user_query_content}<|im_end|>\n<|im_start|>assistant\n" print(f"\nFormatted Prompt:\n{prompt}") inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # Generate response print("Generating response...") with torch.no_grad(): # Inference doesn't need gradient calculation outputs = model.generate( **inputs, max_new_tokens=150, pad_token_id=tokenizer.pad_token_id, # Crucial for generation to avoid warnings/errors eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|im_end|>")] if tokenizer.eos_token_id is not None else None, # Qwen specific EOS handling temperature=0.7, top_p=0.9, do_sample=True ) # Decode and print the response # We need to slice the output to get only the generated part, excluding the input prompt response_text = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(f"\nModel Response:\n{response_text.strip()}") # Example with a new question user_query_new = "中风后如何进行语言恢复训练?" prompt_new = f"<|im_start|>system\n{system_prompt_content}<|im_end|>\n<|im_start|>user\n{user_query_new}<|im_end|>\n<|im_start|>assistant\n" inputs_new = tokenizer(prompt_new, return_tensors="pt").to(model.device) print("\nGenerating response for a new question...") with torch.no_grad(): outputs_new = model.generate( **inputs_new, max_new_tokens=200, pad_token_id=tokenizer.pad_token_id, eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|im_end|>")] if tokenizer.eos_token_id is not None else None, temperature=0.7, top_p=0.9, do_sample=True ) response_text_new = tokenizer.decode(outputs_new[0][inputs_new.input_ids.shape[1]:], skip_special_tokens=True) print(f"\nModel Response (New Question):\n{response_text_new.strip()}") ``` ## License and Attribution The LoRA adapter weights and this model card are made available under the **Apache 2.0 License**. Please see the `LICENSE` file if included, or refer to [Apache 2.0 License details](https://www.apache.org/licenses/LICENSE-2.0). The base model `Qwen/Qwen-1_8B-Chat` is subject to the [Tongyi Qianwen LICENSE AGREEMENT](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT). This fine-tuning experiment is associated with **Tiansuan AI**. For more information, you can visit [https://jinv2.github.io](https://jinv2.github.io). ## Disclaimer The information provided by this model is for general informational and demonstrative purposes only, and **does not constitute medical advice**. Always seek the advice of a qualified health professional for any medical concerns. The outputs of this model are based on a very small dataset and should be critically evaluated. ```