Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -141,60 +141,6 @@ for res in output: 
     | 
|
| 141 | 
         | 
| 142 | 
         
             
            **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
         
     | 
| 143 | 
         | 
| 144 | 
         
            -
            ## PaddleOCR-VL-0.9B Usage with transformers
         
     | 
| 145 | 
         
            -
             
     | 
| 146 | 
         
            -
             
     | 
| 147 | 
         
            -
            Currently, we support inference using the PaddleOCR-VL-0.9B model with the `transformers` library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with `transformers`. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with `transformers`. 
         
     | 
| 148 | 
         
            -
             
     | 
| 149 | 
         
            -
            > [!NOTE]
         
     | 
| 150 | 
         
            -
            > Note: We currently recommend using the official method for inference, as it is faster and supports page-level document parsing. The example code below only supports element-level recognition.
         
     | 
| 151 | 
         
            -
             
     | 
| 152 | 
         
            -
             
     | 
| 153 | 
         
            -
            ```python
         
     | 
| 154 | 
         
            -
            from PIL import Image
         
     | 
| 155 | 
         
            -
            import torch
         
     | 
| 156 | 
         
            -
            from transformers import AutoModelForCausalLM, AutoProcessor
         
     | 
| 157 | 
         
            -
             
     | 
| 158 | 
         
            -
            DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
         
     | 
| 159 | 
         
            -
             
     | 
| 160 | 
         
            -
            CHOSEN_TASK = "ocr"  # Options: 'ocr' | 'table' | 'chart' | 'formula'
         
     | 
| 161 | 
         
            -
            PROMPTS = {
         
     | 
| 162 | 
         
            -
                "ocr": "OCR:",
         
     | 
| 163 | 
         
            -
                "table": "Table Recognition:",
         
     | 
| 164 | 
         
            -
                "formula": "Formula Recognition:",
         
     | 
| 165 | 
         
            -
                "chart": "Chart Recognition:",
         
     | 
| 166 | 
         
            -
            }
         
     | 
| 167 | 
         
            -
             
     | 
| 168 | 
         
            -
            model_path = "PaddlePaddle/PaddleOCR-VL"
         
     | 
| 169 | 
         
            -
            image_path = "test.png"
         
     | 
| 170 | 
         
            -
            image = Image.open(image_path).convert("RGB")
         
     | 
| 171 | 
         
            -
             
     | 
| 172 | 
         
            -
            model = AutoModelForCausalLM.from_pretrained(
         
     | 
| 173 | 
         
            -
                model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
         
     | 
| 174 | 
         
            -
            ).to(DEVICE).eval()
         
     | 
| 175 | 
         
            -
            processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
         
     | 
| 176 | 
         
            -
             
     | 
| 177 | 
         
            -
            messages = [
         
     | 
| 178 | 
         
            -
                {"role": "user",         
         
     | 
| 179 | 
         
            -
                 "content": [
         
     | 
| 180 | 
         
            -
                        {"type": "image", "image": image},
         
     | 
| 181 | 
         
            -
                        {"type": "text", "text": PROMPTS[CHOSEN_TASK]},
         
     | 
| 182 | 
         
            -
                    ]
         
     | 
| 183 | 
         
            -
                }
         
     | 
| 184 | 
         
            -
            ]
         
     | 
| 185 | 
         
            -
            inputs = processor.apply_chat_template(
         
     | 
| 186 | 
         
            -
                messages, 
         
     | 
| 187 | 
         
            -
                tokenize=True, 
         
     | 
| 188 | 
         
            -
                add_generation_prompt=True, 	
         
     | 
| 189 | 
         
            -
                return_dict=True,
         
     | 
| 190 | 
         
            -
            	return_tensors="pt"
         
     | 
| 191 | 
         
            -
            ).to(DEVICE)
         
     | 
| 192 | 
         
            -
             
     | 
| 193 | 
         
            -
            outputs = model.generate(**inputs, max_new_tokens=1024)
         
     | 
| 194 | 
         
            -
            outputs = processor.batch_decode(outputs, skip_special_tokens=True)[0]
         
     | 
| 195 | 
         
            -
            print(outputs)
         
     | 
| 196 | 
         
            -
            ```
         
     | 
| 197 | 
         
            -
             
     | 
| 198 | 
         
             
            ## Performance
         
     | 
| 199 | 
         | 
| 200 | 
         
             
            ### Page-Level Document Parsing 
         
     | 
| 
         | 
|
| 141 | 
         | 
| 142 | 
         
             
            **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
         
     | 
| 143 | 
         | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 144 | 
         
             
            ## Performance
         
     | 
| 145 | 
         | 
| 146 | 
         
             
            ### Page-Level Document Parsing 
         
     |