185
Qwen3-VL-Outpost
๐ฅ
Qwen3-VL / Qwen2.5-VL
Comprehensive Demo of Multimodal VLMs on the Hub
Qwen3-VL / Qwen2.5-VL
nanonets2 / dots.ocr / olmOCR2 / chandraOCR
olmocr / nanonets ocr2 / qwen2vl ocr / aya vision / rolmocr
Chat using Qwen3-VL for Image, Video, PDF, and GIF
Florence-2-large / Florence-2-base
for document parsing task
OCR, VQA, Thinking and Object Detection.
nanonets ocr / smoldocling / monkey ocr / typhoon ocr
cosmos reason1 / docscopeocr / visionocr / captioner relaxed
Vision-Language Models for Document Conversion
Experiment with the Tiny VLMs here
camel doc ocr / core ocr / docscope ocr / monkey ocr
deepcaption / skycaptioner /spacethinker / spaceom / coreocr
thinking / ocr / reasoning