Skip to content

Commit 8d4e1cc

Browse files
committed
update iluvatar gpu fastdeploy whl
1 parent bb6912a commit 8d4e1cc

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

docs/get_started/installation/iluvatar_gpu.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Run ERNIE-4.5-300B-A47B model on iluvatar machine
1+
# Run ERNIE-4.5-300B-A47B & ERNIE-4.5-21B-A3B model on iluvatar machine
22
The current version of the software merely serves as a demonstration demo for the Iluvatar CoreX combined with the Fastdeploy inference framework for large models. There may be issues when running the latest ERNIE4.5 model, and we will conduct repairs and performance optimization in the future. Subsequent versions will provide customers with a more stable version.
33

44
## Machine Preparation
@@ -62,7 +62,7 @@ prompts = [
6262
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
6363

6464
# load the model
65-
llm = LLM(model="/home/paddle/ernie-4_5-300b-a47b-bf16-paddle", tensor_parallel_size=16, max_model_len=8192)
65+
llm = LLM(model="/home/paddle/ernie-4_5-300b-a47b-bf16-paddle", tensor_parallel_size=16, max_model_len=8192, static_decode_blocks=0quantization='wint8')
6666

6767
# Perform batch inference
6868
outputs = llm.generate(prompts, sampling_params)

docs/zh/get_started/installation/iluvatar_gpu.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# 如何在天数机器上运行 ERNIE-4.5-300B-A47B-BF16
1+
# 如何在天数机器上运行 ERNIE-4.5-300B-A47B-BF16 & ERNIE-4.5-21B-A3B
22
当前版本软件只是作为天数芯片 + Fastdeploy 推理大模型的一个演示 demo,跑最新ERNIE4.5模型可能存在问题,后续进行修复和性能优化,给客户提供一个更稳定的版本。
33

44
## 准备机器
@@ -62,7 +62,7 @@ prompts = [
6262
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
6363

6464
# 加载模型
65-
llm = LLM(model="/home/paddle/ernie-4_5-300b-a47b-bf16-paddle", tensor_parallel_size=16, max_model_len=8192)
65+
llm = LLM(model="/home/paddle/ernie-4_5-300b-a47b-bf16-paddle", tensor_parallel_size=16, max_model_len=8192, static_decode_blocks=0quantization='wint8')
6666

6767
# 批量进行推理(llm内部基于资源情况进行请求排队、动态插入处理)
6868
outputs = llm.generate(prompts, sampling_params)

0 commit comments

Comments
 (0)