Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion benchmark/mmmu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Host the VLM:

```
python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000
```

It's recommended to reduce the memory usage by appending something like `--mem-fraction-static 0.6` to the command above.
Expand Down
2 changes: 1 addition & 1 deletion benchmark/mmmu/bench_sglang.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Bench the sglang-hosted vLM with benchmark MMMU

Usage:
Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000

Benchmark: python benchmark/mmmu/bench_sglang.py --port 30000 --concurrency 16

Expand Down
13 changes: 4 additions & 9 deletions docs/backend/openai_api_vision.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,7 @@
"source": [
"## Launch A Server\n",
"\n",
"Launch the server in your terminal and wait for it to initialize.\n",
"\n",
"**Remember to add** `--chat-template` **for example** `--chat-template=qwen2-vl` **to specify the [vision chat template](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template), otherwise, the server will only support text (images won’t be passed in), which can lead to degraded performance.**\n",
"\n",
"We need to specify `--chat-template` for vision language models because the chat template provided in Hugging Face tokenizer only supports text."
"Launch the server in your terminal and wait for it to initialize."
]
},
{
Expand All @@ -51,8 +47,7 @@
"\n",
"vision_process, port = launch_server_cmd(\n",
" \"\"\"\n",
"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct \\\n",
" --chat-template=qwen2-vl\n",
"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct\n",
"\"\"\"\n",
")\n",
"\n",
Expand Down Expand Up @@ -255,9 +250,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Chat Template\n",
"## Chat Template (for sglang version < 0.4.6.post2)\n",
"\n",
"As mentioned before, if you do not specify a vision model's `--chat-template`, the server uses Hugging Face's default template, which only supports text.\n",
"If you do not specify a vision model's `--chat-template`, the server uses Hugging Face's default template, which only supports text, and may lead to degraded performance.\n",
"\n",
"We list popular vision models with their chat templates:\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/backend/sampling_params.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ Detailed example in [openai compatible api](https://docs.sglang.ai/backend/opena
Launch a server:

```bash
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-7b-ov --chat-template chatml-llava
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-7b-ov
```

Download an image:
Expand Down
3 changes: 1 addition & 2 deletions docs/supported_models/embedding_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
SGLang provides robust support for embedding models by integrating efficient serving mechanisms with its flexible programming interface. This integration allows for streamlined handling of embedding tasks, facilitating faster and more accurate retrieval and semantic search operations. SGLang's architecture enables better resource utilization and reduced latency in embedding model deployment.

```{important}
They are executed with `--is-embedding` and some may require `--trust-remote-code` and/or `--chat-template`
They are executed with `--is-embedding` and some may require `--trust-remote-code`
```

## Example launch Command
Expand All @@ -13,7 +13,6 @@ python3 -m sglang.launch_server \
--model-path Alibaba-NLP/gme-Qwen2-VL-2B-Instruct \ # example HF/local path
--is-embedding \
--host 0.0.0.0 \
--chat-template gme-qwen2-vl \ # set chat template
--port 30000 \
```

Expand Down
5 changes: 0 additions & 5 deletions docs/supported_models/vision_language_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,11 @@

These models accept multi-modal inputs (e.g., images and text) and generate text output. They augment language models with visual encoders and require a specific chat template for handling vision prompts.

```{important}
We need to specify `--chat-template` for VLMs because the chat template provided in HuggingFace tokenizer only supports text. If you do not specify a vision model’s `--chat-template`, the server uses HuggingFace’s default template, which only supports text and the images won’t be passed in.
```

## Example launch Command

```shell
python3 -m sglang.launch_server \
--model-path meta-llama/Llama-3.2-11B-Vision-Instruct \ # example HF/local path
--chat-template llama_3_vision \ # required chat template
--host 0.0.0.0 \
--port 30000 \
```
Expand Down
2 changes: 1 addition & 1 deletion examples/runtime/engine/offline_batch_inference_vlm.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Usage:
python offline_batch_inference_vlm.py --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template=qwen2-vl
python offline_batch_inference_vlm.py --model-path Qwen/Qwen2-VL-7B-Instruct
"""

import argparse
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
Usage:

python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-72b-ov --port=30000 --tp-size=8 --chat-template=chatml-llava
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-72b-ov --port=30000 --tp-size=8

python3 http_llava_onevision_test.py
"""
Expand Down
2 changes: 1 addition & 1 deletion examples/runtime/multimodal_embedding.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# launch server
# python -m sglang.launch_server --model-path Alibaba-NLP/gme-Qwen2-VL-2B-Instruct --is-embedding --chat-template gme-qwen2-vl
# python -m sglang.launch_server --model-path Alibaba-NLP/gme-Qwen2-VL-2B-Instruct --is-embedding

import requests

Expand Down
135 changes: 57 additions & 78 deletions python/sglang/lang/chat_template.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import re
from dataclasses import dataclass
from enum import Enum, auto
from typing import Callable, Dict, List, Tuple
Expand Down Expand Up @@ -71,9 +72,9 @@ def get_chat_template(name):

def get_chat_template_by_model_path(model_path):
for matching_func in matching_function_registry:
template = matching_func(model_path)
if template is not None:
return template
template_name = matching_func(model_path)
if template_name is not None:
return get_chat_template(template_name)
return get_chat_template("default")


Expand Down Expand Up @@ -479,134 +480,112 @@ def get_chat_template_by_model_path(model_path):

@register_chat_template_matching_function
def match_deepseek(model_path: str):
if (
"deepseek-v3" in model_path.lower() or "deepseek-r1" in model_path.lower()
) and "base" not in model_path.lower():
return get_chat_template("deepseek-v3")
if re.search(r"deepseek-(v3|r1)", model_path, re.IGNORECASE) and not re.search(
r"base", model_path, re.IGNORECASE
):
return "deepseek-v3"


@register_chat_template_matching_function
def match_deepseek_janus_pro(model_path: str):
if "janus" in model_path.lower():
return get_chat_template("janus-pro")
if re.search(r"janus", model_path, re.IGNORECASE):
return "janus-pro"


@register_chat_template_matching_function
def match_dbrx(model_path: str):
if "dbrx" in model_path.lower() and "instruct" in model_path.lower():
return get_chat_template("dbrx-instruct")
if re.search(r"dbrx", model_path, re.IGNORECASE) and re.search(
r"instruct", model_path, re.IGNORECASE
):
return "dbrx-instruct"


@register_chat_template_matching_function
def match_vicuna(model_path: str):
if "vicuna" in model_path.lower():
return get_chat_template("vicuna_v1.1")
if "llava-v1.5" in model_path.lower():
return get_chat_template("vicuna_v1.1")
if "llava-next-video-7b" in model_path.lower():
return get_chat_template("vicuna_v1.1")
if re.search(r"vicuna|llava-v1\.5|llava-next-video-7b", model_path, re.IGNORECASE):
return "vicuna_v1.1"


@register_chat_template_matching_function
def match_llama2_chat(model_path: str):
model_path = model_path.lower()
if "llama-2" in model_path and "chat" in model_path:
return get_chat_template("llama-2-chat")
if (
"mistral" in model_path or "mixtral" in model_path
) and "instruct" in model_path:
return get_chat_template("llama-2-chat")
if "codellama" in model_path and "instruct" in model_path:
return get_chat_template("llama-2-chat")
if re.search(
r"llama-2.*chat|(mistral|mixtral).*instruct|codellama.*instruct",
model_path,
re.IGNORECASE,
):
return "llama-2-chat"


@register_chat_template_matching_function
def match_llama3_instruct(model_path: str):
model_path = model_path.lower()
if "llama-3" in model_path and "instruct" in model_path:
return get_chat_template("llama-3-instruct")
if re.search(r"llama-3.*instruct", model_path, re.IGNORECASE):
return "llama-3-instruct"


@register_chat_template_matching_function
def match_chat_ml(model_path: str):
# import pdb;pdb.set_trace()
model_path = model_path.lower()
if "tinyllama" in model_path:
return get_chat_template("chatml")
# Now the suffix for qwen2 chat model is "instruct"
if "qwen" in model_path and "vl" in model_path:
return get_chat_template("qwen2-vl")
if "qwen" in model_path:
if "vl" in model_path:
return get_chat_template("qwen2-vl")
if ("chat" in model_path or "instruct" in model_path) and (
"llava" not in model_path
):
return get_chat_template("qwen")
if (
"llava-v1.6-34b" in model_path
or "llava-v1.6-yi-34b" in model_path
or "llava-next-video-34b" in model_path
or "llava-onevision-qwen2" in model_path
if re.search(r"tinyllama", model_path, re.IGNORECASE):
return "chatml"
if re.search(r"qwen.*vl", model_path, re.IGNORECASE):
return "qwen2-vl"
if re.search(r"qwen.*(chat|instruct)", model_path, re.IGNORECASE) and not re.search(
r"llava", model_path, re.IGNORECASE
):
return get_chat_template("chatml-llava")
return "qwen"
if re.search(
r"llava-v1\.6-34b|llava-v1\.6-yi-34b|llava-next-video-34b|llava-onevision-qwen2",
model_path,
re.IGNORECASE,
):
return "chatml-llava"


@register_chat_template_matching_function
def match_chat_yi(model_path: str):
model_path = model_path.lower()
if "yi-vl" in model_path and "llava" not in model_path:
return get_chat_template("yi-vl")
elif "yi-1.5" in model_path and "chat" in model_path:
return get_chat_template("yi-1.5")
if re.search(r"yi-vl", model_path, re.IGNORECASE) and not re.search(
r"llava", model_path, re.IGNORECASE
):
return "yi-vl"
elif re.search(r"yi-1\.5.*chat", model_path, re.IGNORECASE):
return "yi-1.5"


@register_chat_template_matching_function
def match_gemma_it(model_path: str):
model_path = model_path.lower()
if "gemma" in model_path and "it" in model_path:
return get_chat_template("gemma-it")
if re.search(r"gemma.*it", model_path, re.IGNORECASE):
return "gemma-it"


@register_chat_template_matching_function
def match_openbmb_minicpm(model_path: str):
model_path = model_path.lower()
if "minicpm-v" in model_path:
return get_chat_template("minicpmv")
elif "minicpm-o" in model_path:
return get_chat_template("minicpmo")
if re.search(r"minicpm-v", model_path, re.IGNORECASE):
return "minicpmv"
elif re.search(r"minicpm-o", model_path, re.IGNORECASE):
return "minicpmo"


@register_chat_template_matching_function
def match_c4ai_command_r(model_path: str):
model_path = model_path.lower()
if "c4ai-command-r" in model_path:
return get_chat_template("c4ai-command-r")
if re.search(r"c4ai-command-r", model_path, re.IGNORECASE):
return "c4ai-command-r"


@register_chat_template_matching_function
def match_granite_instruct(model_path: str):
model_path = model_path.lower()
# When future versions of Granite are released, this code may
# need to be updated. For now, assume that the Granite 3.0
# template works across the board.
if "granite" in model_path and "instruct" in model_path:
return get_chat_template("granite-3-instruct")
if re.search(r"granite.*instruct", model_path, re.IGNORECASE):
return "granite-3-instruct"


@register_chat_template_matching_function
def match_gemma3_instruct(model_path: str):
model_path = model_path.lower()
if "gemma-3" in model_path and "1b" not in model_path:
# gemma-3-1b-it is completion model
return get_chat_template("gemma-it")
if re.search(r"gemma-3", model_path, re.IGNORECASE):
return "gemma-it"


@register_chat_template_matching_function
def match_internvl_chat(model_path: str):
model_path = model_path.lower()
if "internvl" in model_path:
return get_chat_template("internvl-2-5")
if re.search(r"internvl2_5", model_path, re.IGNORECASE):
return "internvl-2-5"


if __name__ == "__main__":
Expand Down
Loading