YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Lexiq Reader 3B

Fine-tuned from Jina AI's ReaderLM-v2

Overview

Lexiq Reader 3B is a specialized 1.5B parameter language model optimized for converting raw HTML into clean, structured markdown and JSON. This model is fine-tuned from Jina AI's ReaderLM-v2 for enhanced performance in document processing pipelines.

Model Details

  • Base Model: ReaderLM-v2 (Qwen2.5-1.5B architecture)
  • Parameters: 1.54B
  • Context Window: Up to 512K tokens
  • Supported Languages: 29 languages including English, Chinese, Japanese, Korean, French, Spanish, Portuguese, German, Italian, Russian, Vietnamese, Thai, Arabic
  • License: CC-BY-NC-4.0

Key Features

  • HTML to Markdown: Converts complex HTML with tables, lists, code blocks, and LaTeX
  • HTML to JSON: Direct extraction using predefined schemas
  • Long Context: Handles documents up to 512K tokens
  • Multilingual: Comprehensive support across 29 languages
  • Optimized for Production: Enhanced stability for long-form content generation

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"  # or "cpu"
tokenizer = AutoTokenizer.from_pretrained("remodlai/lexiq-reader-3b")
model = AutoModelForCausalLM.from_pretrained("remodlai/lexiq-reader-3b").to(device)

# Create prompt
html = "<html><body><h1>Hello, world!</h1></body></html>"
messages = [{"role": "user", "content": f"Extract the main content from the given HTML and convert it to Markdown format.\n```html\n{html}\n```"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate
inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08)
print(tokenizer.decode(outputs[0]))

Fine-tuning Details

This model has been fine-tuned for:

  • Enhanced document structure preservation
  • Improved handling of technical documentation
  • Better extraction of code snippets and API documentation
  • Optimized for multimodal RAG pipelines

Deployment

Modal

See deployment examples in the modal/ directory for serverless deployment with auto-scaling.

vLLM

For high-throughput inference:

from vllm import LLM, SamplingParams

llm = LLM(model="remodlai/lexiq-reader-3b", max_model_len=256000, dtype='float16')
sampling_params = SamplingParams(temperature=0, top_k=1, max_tokens=8192)

Hardware Requirements

  • Minimum: T4 GPU (16GB VRAM)
  • Recommended: RTX 3090/4090 or A10G for optimal performance
  • Memory Usage: ~3GB model weights + KV cache

Credits

This model is based on ReaderLM-v2 by Jina AI.

License

CC-BY-NC-4.0 - Non-commercial use only

Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support