Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,8 @@ language:
|
|
| 13 |
- en
|
| 14 |
- zh
|
| 15 |
- multilingual
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
<div align="center">
|
|
@@ -1231,4 +1233,4 @@ We also thank [DocLayNet](https://github.com/DS4SD/DocLayNet), [M6Doc](https://g
|
|
| 1231 |
- **Performance Bottleneck:** Despite its 1.7B parameter LLM foundation, **dots.ocr** is not yet optimized for high-throughput processing of large PDF volumes.
|
| 1232 |
|
| 1233 |
We are committed to achieving more accurate table and formula parsing, as well as enhancing the model's OCR capabilities for broader generalization, all while aiming for **a more powerful, more efficient model**. Furthermore, we are actively considering the development of **a more general-purpose perception model** based on Vision-Language Models (VLMs), which would integrate general detection, image captioning, and OCR tasks into a unified framework. **Parsing the content of the pictures in the documents** is also a key priority for our future work.
|
| 1234 |
-
We believe that collaboration is the key to tackling these exciting challenges. If you are passionate about advancing the frontiers of document intelligence and are interested in contributing to these future endeavors, we would love to hear from you. Please reach out to us via email at: [[email protected]].
|
|
|
|
| 13 |
- en
|
| 14 |
- zh
|
| 15 |
- multilingual
|
| 16 |
+
base_model:
|
| 17 |
+
- rednote-hilab/dots.ocr
|
| 18 |
---
|
| 19 |
|
| 20 |
<div align="center">
|
|
|
|
| 1233 |
- **Performance Bottleneck:** Despite its 1.7B parameter LLM foundation, **dots.ocr** is not yet optimized for high-throughput processing of large PDF volumes.
|
| 1234 |
|
| 1235 |
We are committed to achieving more accurate table and formula parsing, as well as enhancing the model's OCR capabilities for broader generalization, all while aiming for **a more powerful, more efficient model**. Furthermore, we are actively considering the development of **a more general-purpose perception model** based on Vision-Language Models (VLMs), which would integrate general detection, image captioning, and OCR tasks into a unified framework. **Parsing the content of the pictures in the documents** is also a key priority for our future work.
|
| 1236 |
+
We believe that collaboration is the key to tackling these exciting challenges. If you are passionate about advancing the frontiers of document intelligence and are interested in contributing to these future endeavors, we would love to hear from you. Please reach out to us via email at: [[email protected]].
|