Spaces:

LLDDWW
/

MedCard

Running

LLDDWW Claude commited on Oct 13

Commit

7fabc42

1 Parent(s): e96841e

fix: upgrade to Qwen2.5-VL-3B with 8bit quantization

- Replace Qwen2-VL-2B with Qwen2.5-VL-3B for better OCR quality
- Apply 8bit quantization to both models for faster inference
- Add bitsandbytes dependency for quantization support
- Better accuracy with optimized speed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (2) hide show

app.py +13 -11
requirements.txt +1 -0

app.py CHANGED Viewed

@@ -8,7 +8,7 @@ import gradio as gr
 import spaces
 import torch
 from PIL import Image
-from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, AutoTokenizer, AutoModelForCausalLM
 from qwen_vl_utils import process_vision_info
 from huggingface_hub import login
@@ -17,8 +17,8 @@ HF_TOKEN = os.getenv("HF_TOKEN")
 if HF_TOKEN:
     login(token=HF_TOKEN.strip())
-# OCR 모델 ID (더 빠른 추론을 위해 2B 모델 사용)
-OCR_MODEL_ID = "Qwen/Qwen2-VL-2B-Instruct"
 # 약 정보 분석 모델 ID (의료 전문)
 MED_MODEL_ID = "google/medgemma-4b-it"
@@ -34,21 +34,23 @@ def load_models():
     global OCR_MODEL, OCR_PROCESSOR, MED_MODEL, MED_TOKENIZER
     if OCR_MODEL is None:
-        print("🔄 Loading Qwen2-VL-2B for OCR...")
-        OCR_MODEL = Qwen2VLForConditionalGeneration.from_pretrained(
             OCR_MODEL_ID,
-            torch_dtype=torch.bfloat16,
-            device_map="auto"
         )
         OCR_PROCESSOR = AutoProcessor.from_pretrained(OCR_MODEL_ID)
         print("✅ OCR model loaded!")
     if MED_MODEL is None:
-        print("🔄 Loading MedGemma-4B for medical analysis...")
         MED_MODEL = AutoModelForCausalLM.from_pretrained(
             MED_MODEL_ID,
             torch_dtype=torch.bfloat16,
-            device_map="auto"
         )
         MED_TOKENIZER = AutoTokenizer.from_pretrained(MED_MODEL_ID)
         print("✅ Medical model loaded!")
@@ -396,8 +398,8 @@ with gr.Blocks(theme=gr.themes.Soft(), css=CUSTOM_CSS) as demo:
     - AI가 생성한 정보이므로 정확하지 않을 수 있습니다
     **🤖 기술 스택**
-    - Qwen2-VL-2B-Instruct (빠른 OCR 텍스트 추출)
-    - Google MedGemma-4B-IT (의료 전문 모델 - 약 정보 분석 및 설명)
     **🔑 설정 방법**
     - Hugging Face Spaces의 Settings → Repository secrets에서 `HF_TOKEN` 추가 필요

 import spaces
 import torch
 from PIL import Image
+from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, AutoTokenizer, AutoModelForCausalLM
 from qwen_vl_utils import process_vision_info
 from huggingface_hub import login
 if HF_TOKEN:
     login(token=HF_TOKEN.strip())
+# OCR 모델 ID (품질 우선)
+OCR_MODEL_ID = "Qwen/Qwen2.5-VL-3B-Instruct"
 # 약 정보 분석 모델 ID (의료 전문)
 MED_MODEL_ID = "google/medgemma-4b-it"
     global OCR_MODEL, OCR_PROCESSOR, MED_MODEL, MED_TOKENIZER
     if OCR_MODEL is None:
+        print("🔄 Loading Qwen2.5-VL-3B for OCR (8bit quantization)...")
+        OCR_MODEL = Qwen2_5_VLForConditionalGeneration.from_pretrained(
             OCR_MODEL_ID,
+            torch_dtype="auto",
+            device_map="auto",
+            load_in_8bit=True
         )
         OCR_PROCESSOR = AutoProcessor.from_pretrained(OCR_MODEL_ID)
         print("✅ OCR model loaded!")
     if MED_MODEL is None:
+        print("🔄 Loading MedGemma-4B for medical analysis (8bit quantization)...")
         MED_MODEL = AutoModelForCausalLM.from_pretrained(
             MED_MODEL_ID,
             torch_dtype=torch.bfloat16,
+            device_map="auto",
+            load_in_8bit=True
         )
         MED_TOKENIZER = AutoTokenizer.from_pretrained(MED_MODEL_ID)
         print("✅ Medical model loaded!")
     - AI가 생성한 정보이므로 정확하지 않을 수 있습니다
     **🤖 기술 스택**
+    - Qwen2.5-VL-3B-Instruct (8bit 양자화, 고품질 OCR)
+    - Google MedGemma-4B-IT (8bit 양자화, 의료 전문 모델)
     **🔑 설정 방법**
     - Hugging Face Spaces의 Settings → Repository secrets에서 `HF_TOKEN` 추가 필요

requirements.txt CHANGED Viewed

@@ -7,3 +7,4 @@ numpy
 qwen-vl-utils
 accelerate
 huggingface_hub

 qwen-vl-utils
 accelerate
 huggingface_hub
+bitsandbytes