Skip to content

Commit d493c0d

Browse files
authored
docs: add best practice (#60)
1 parent 2bf474b commit d493c0d

File tree

5 files changed

+169
-156
lines changed

5 files changed

+169
-156
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,6 @@ Easy Model Deployer is a lightweight tool designed to simplify the deployment of
2020

2121
![cli](docs/images/cli.gif)
2222

23-
**Supported Models**
24-
25-
For a detailed list of supported models, please refer to [Supported Models](docs/en/supported_models.md)
26-
2723
**Key Features**
2824
- One-click deployment of models to the cloud (Amazon SageMaker, Amazon ECS, Amazon EC2)
2925
- Diverse model types (LLMs, VLMs, Embeddings, Vision, etc.)
@@ -76,9 +72,13 @@ Deploy models with an interactive CLI or one command.
7672
emd deploy
7773
```
7874

79-
> **💡 Tip** To view all available parameters, run `emd deploy --help`.
80-
> When you see the message "Waiting for model: ...", it means the deployment task has started and you can stop the terminal output by `Ctrl+C`.
81-
> For more information on deployment parameters, please refer to the [Deployment parameters](docs/en/deployment.md).
75+
![deploy](docs/images/emd-deploy.png)
76+
77+
78+
> **Note:** To view all available parameters, run `emd deploy --help`.
79+
> When you see the message "Waiting for model: ...", it means the deployment task has started and you can stop the terminal output by pressing `Ctrl+C`.
80+
> For more information on deployment parameters, please refer to the [Deployment parameters](docs/en/installation.md).
81+
> For best practice examples of using command line parameters, please refer to the [Best Deployment Practices](docs/en/best_deployment_practices.md).
8282
8383
### Show Status
8484

docs/en/best_deployment_practices.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
2+
# Best Deployment Practices
3+
4+
This document provides examples of best practices for deploying models using EMD for various use cases.
5+
6+
7+
## Deploying to Specific GPU Types
8+
9+
10+
Choosing the right GPU type is critical for optimal performance and cost-efficiency. Use the `--instance-type` parameter to specify the GPU instance.
11+
12+
13+
### Example: Deploying Qwen2.5-7B on g5.2xlarge
14+
15+
```bash
16+
emd deploy --model-id Qwen2.5-7B-Instruct --instance-type g5.2xlarge --engine-type vllm --service-type sagemaker
17+
```
18+
19+
20+
## Achieving Longer Context Windows
21+
22+
23+
To enable longer context windows, use the `--extra-params` option with engine-specific parameters.
24+
25+
26+
### Example: Deploying model with 16k context window
27+
28+
```bash
29+
emd deploy --model-id Qwen2.5-7B-Instruct --instance-type g5.4xlarge --engine-type vllm --service-type sagemaker --extra-params '{
30+
"engine_params": {
31+
"vllm_cli_args": "--max_model_len 16000 --max_num_seqs 4"
32+
}
33+
}'
34+
```
35+
36+
### Example: Deploying model on G4dn instance
37+
38+
```bash
39+
emd deploy --model-id Qwen2.5-14B-Instruct-AWQ --instance-type g4dn.2xlarge --engine-type vllm --service-type sagemaker --extra-params '{
40+
"engine_params": {
41+
"environment_variables": "export VLLM_ATTENTION_BACKEND=XFORMERS && export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True",
42+
"default_cli_args": " --chat-template emd/models/chat_templates/qwen_2d5_add_prefill_chat_template.jinja --max_model_len 12000 --max_num_seqs 10 --gpu_memory_utilization 0.95 --disable-log-stats --enable-auto-tool-choice --tool-call-parser hermes"
43+
}
44+
}'
45+
```
46+
47+
48+
49+
## Common Troubleshooting
50+
51+
If your deployment fails due to out-of-memory issues, try:
52+
53+
- Using a larger instance type
54+
- Reducing max_model_len and max_num_seqs in the engine parameters
55+
- Setting a lower gpu_memory_utilization value (e.g., 0.8 instead of the default)

docs/en/deployment.md

Lines changed: 0 additions & 145 deletions
This file was deleted.

docs/en/installation.md

Lines changed: 106 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# Installation Guide
1+
## Installation Guide
22

3-
## Prerequisites
3+
### Prerequisites
44
- Python 3.9 or higher
55
- pip (Python package installer)
66

7-
## Setting up the Environment
7+
### Setting up the Environment
88

99
1. Create a virtual environment:
1010
```bash
@@ -20,3 +20,106 @@ source emd-env/bin/activate
2020
```bash
2121
pip install https://github.com/aws-samples/easy-model-deployer/releases/download/main/emd-0.6.0-py3-none-any.whl
2222
```
23+
24+
25+
## Deployment parameters
26+
27+
### --force-update-env-stack
28+
No additional ```emd bootstrap``` required for deployment. Because of other commands, status/destroy etc. require pre-bootstrapping. Therefore, it is recommended to run ```emd bootstrap``` separately after each upgrade.
29+
30+
### --extra-params
31+
Extra parameters passed to the model deployment. extra-params should be a Json object of dictionary format as follows:
32+
33+
```json
34+
{
35+
36+
"model_params": {
37+
},
38+
"service_params":{
39+
},
40+
"instance_params":{
41+
},
42+
"engine_params":{
43+
"cli_args": "<command line arguments of current engine>",
44+
"api_key":"<api key>"
45+
},
46+
"framework_params":{
47+
"uvicorn_log_level":"info",
48+
"limit_concurrency":200
49+
}
50+
}
51+
```
52+
To learn some practice examples, please refer to the [Best Deployment Practices](docs/en/best_deployment_practices.md).
53+
54+
55+
56+
## Local deployment on the ec2 instance
57+
58+
This is suitable for deploying models using local GPU resources.
59+
60+
### Pre-requisites
61+
62+
#### Start and connect to EC2 instance
63+
64+
It is recommended to launch the instance using the AMI "**Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.6 (Ubuntu 22.04)**".
65+
66+
67+
### Deploy model using EMD
68+
69+
```sh
70+
emd deploy --allow-local-deploy
71+
```
72+
73+
There some EMD configuration sample settings for model deployment in the following two sections: [Non-reasoning Model deployment configuration](#non-reasoning-model-deployment-configuration) and [Reasoning Model deployment configuration](#reasoning-model-deployment-configuration).
74+
Wait for the model deployment to complete.
75+
76+
#### Non-reasoning Model deployment configuration
77+
78+
##### Qwen2.5-72B-Instruct-AWQ
79+
80+
```
81+
? Select the model series: qwen2.5
82+
? Select the model name: Qwen2.5-72B-Instruct-AWQ
83+
? Select the service for deployment: Local
84+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
85+
? Select the inference engine to use: tgi
86+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
87+
```
88+
89+
##### llama-3.3-70b-instruct-awq
90+
```
91+
? Select the model series: llama
92+
? Select the model name: llama-3.3-70b-instruct-awq
93+
? Select the service for deployment: Local
94+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
95+
engine type: tgi
96+
framework type: fastapi
97+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
98+
```
99+
100+
#### Reasoning Model deployment configuration
101+
102+
##### DeepSeek-R1-Distill-Qwen-32B
103+
```
104+
? Select the model series: deepseek reasoning model
105+
? Select the model name: DeepSeek-R1-Distill-Qwen-32B
106+
? Select the service for deployment: Local
107+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
108+
engine type: vllm
109+
framework type: fastapi
110+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--enable-reasoning --reasoning-parser deepseek_r1 --max_model_len 16000 --disable-log-stats --chat-template emd/models/chat_templates/deepseek_r1_distill.jinja --max_num_seq 20 --gpu_memory_utilization 0.9"}}
111+
```
112+
113+
##### deepseek-r1-distill-llama-70b-awq
114+
115+
```
116+
? Select the model series: deepseek reasoning model
117+
? Select the model name: deepseek-r1-distill-llama-70b-awq
118+
? Select the service for deployment: Local
119+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
120+
? Select the inference engine to use: tgi
121+
framework type: fastapi
122+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
123+
```
124+
125+
## Examples

docs/mkdocs.en.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ theme:
66
nav:
77
- Architecture: architecture.md
88
- Installation: installation.md
9-
- Deployment: deployment.md
9+
- Best-Deployment-Practices: best_deployment_practices.md
1010
- EMD-Client: emd_client.md
1111
- Langchain-Interface: langchain_interface.md
1212
- OpenAI-Compatiable: openai_compatiable.md

0 commit comments

Comments
 (0)