MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool Paper • 2406.17565 • Published Jun 25, 2024 • 4
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving Paper • 2405.11299 • Published May 18, 2024 • 1