--- license: apache-2.0 library_name: scikit-learn tags: - quantile-regression - scheduling - systems - llm-serving - nsdi --- # JITServe QRF Length Predictor This repository provides the **pretrained QRF (Quantile Regression Forest) length predictor** used by **[JITServe (NSDI’26)](https://arxiv.org/abs/2504.20068)** to estimate conservative upper bounds on LLM output lengths. This predictor is: - **Not an LLM evaluation model** - **Not fine-tuned during inference** - A lightweight **offline-trained prediction model** used solely for scheduling decisions It is released to ensure **full reproducibility** of the JITServe artifact. --- ## What Is Included This repository contains two components that must be used together: ```text qrf_model/ ├── 0_qrf_lmsys_chat_llama3_8b.pkl └── 0_qrf_lmsys_chat_qwen25_7b.pkl qrf_vectorizer/ ├── 0_qrf_lmsys_chat_llama3_8b.pkl └── 0_qrf_lmsys_chat_qwen25_7b.pkl ``` ## Usage These artifacts are consumed by JITServe at runtime. Expected directory layout in the JITServe artifact: ``` assets/qrf/ ├── qrf_model/ └── qrf_vectorizer/ ``` After downloading this repository, place its contents under the path above. JITServe loads the predictor automatically during startup and does not require any additional configuration by default. ## Citation If you use these artifacts, please consider to cite our paper: ``` @misc{zhang2025jitservesloawarellmserving, title={JITServe: SLO-aware LLM Serving with Imprecise Request Information}, author={Wei Zhang and Zhiyu Wu and Yi Mu and Rui Ning and Banruo Liu and Nikhil Sarda and Myungjin Lee and Fan Lai}, year={2025}, eprint={2504.20068}, archivePrefix={arXiv}, primaryClass={cs.DC}, url={https://arxiv.org/abs/2504.20068}, } ```