DeepSeek OCR Experimental
Testing for the latest transformers (DeepSeek-OCR).
Comprehensive Demo of Multimodal VLMs on the Hub
Testing for the latest transformers (DeepSeek-OCR).
Qwen3-VL / Qwen2.5-VL
nanonets ocr2 / olmocr / qwen2vl ocr / aya vision / rolmocr
nanonets2-ocr / chandra-ocr / dots.ocr / olm-ocr2
object detection, visual grounding, keypoint detection
Florence-2 vision models demo. (transformers)
Florence-2-large / Florence-2-base
for document parsing task
OCR, VQA, Thinking and Object Detection.
nanonets ocr / smoldocling / monkey ocr / typhoon ocr
cosmos reason1 / docscopeocr / visionocr / captioner relaxed
Vision-Language Models for Document Conversion
Experiment with the Tiny VLMs here
camel doc ocr / core ocr / docscope ocr / monkey ocr
deepcaption / skycaptioner /spacethinker / spaceom / coreocr
thinking / ocr / reasoning