You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,10 +20,6 @@ Easy Model Deployer is a lightweight tool designed to simplify the deployment of
20
20
21
21

22
22
23
-
**Supported Models**
24
-
25
-
For a detailed list of supported models, please refer to [Supported Models](docs/en/supported_models.md)
26
-
27
23
**Key Features**
28
24
- One-click deployment of models to the cloud (Amazon SageMaker, Amazon ECS, Amazon EC2)
29
25
- Diverse model types (LLMs, VLMs, Embeddings, Vision, etc.)
@@ -76,9 +72,13 @@ Deploy models with an interactive CLI or one command.
76
72
emd deploy
77
73
```
78
74
79
-
> **💡 Tip** To view all available parameters, run `emd deploy --help`.
80
-
> When you see the message "Waiting for model: ...", it means the deployment task has started and you can stop the terminal output by `Ctrl+C`.
81
-
> For more information on deployment parameters, please refer to the [Deployment parameters](docs/en/deployment.md).
75
+

76
+
77
+
78
+
> **Note:** To view all available parameters, run `emd deploy --help`.
79
+
> When you see the message "Waiting for model: ...", it means the deployment task has started and you can stop the terminal output by pressing `Ctrl+C`.
80
+
> For more information on deployment parameters, please refer to the [Deployment parameters](docs/en/installation.md).
81
+
> For best practice examples of using command line parameters, please refer to the [Best Deployment Practices](docs/en/best_deployment_practices.md).
No additional ```emd bootstrap``` required for deployment. Because of other commands, status/destroy etc. require pre-bootstrapping. Therefore, it is recommended to run ```emd bootstrap``` separately after each upgrade.
29
+
30
+
### --extra-params
31
+
Extra parameters passed to the model deployment. extra-params should be a Json object of dictionary format as follows:
32
+
33
+
```json
34
+
{
35
+
36
+
"model_params": {
37
+
},
38
+
"service_params":{
39
+
},
40
+
"instance_params":{
41
+
},
42
+
"engine_params":{
43
+
"cli_args": "<command line arguments of current engine>",
44
+
"api_key":"<api key>"
45
+
},
46
+
"framework_params":{
47
+
"uvicorn_log_level":"info",
48
+
"limit_concurrency":200
49
+
}
50
+
}
51
+
```
52
+
To learn some practice examples, please refer to the [Best Deployment Practices](docs/en/best_deployment_practices.md).
53
+
54
+
55
+
56
+
## Local deployment on the ec2 instance
57
+
58
+
This is suitable for deploying models using local GPU resources.
59
+
60
+
### Pre-requisites
61
+
62
+
#### Start and connect to EC2 instance
63
+
64
+
It is recommended to launch the instance using the AMI "**Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.6 (Ubuntu 22.04)**".
65
+
66
+
67
+
### Deploy model using EMD
68
+
69
+
```sh
70
+
emd deploy --allow-local-deploy
71
+
```
72
+
73
+
There some EMD configuration sample settings for model deployment in the following two sections: [Non-reasoning Model deployment configuration](#non-reasoning-model-deployment-configuration) and [Reasoning Model deployment configuration](#reasoning-model-deployment-configuration).
74
+
Wait for the model deployment to complete.
75
+
76
+
#### Non-reasoning Model deployment configuration
77
+
78
+
##### Qwen2.5-72B-Instruct-AWQ
79
+
80
+
```
81
+
? Select the model series: qwen2.5
82
+
? Select the model name: Qwen2.5-72B-Instruct-AWQ
83
+
? Select the service for deployment: Local
84
+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
85
+
? Select the inference engine to use: tgi
86
+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
87
+
```
88
+
89
+
##### llama-3.3-70b-instruct-awq
90
+
```
91
+
? Select the model series: llama
92
+
? Select the model name: llama-3.3-70b-instruct-awq
93
+
? Select the service for deployment: Local
94
+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
95
+
engine type: tgi
96
+
framework type: fastapi
97
+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
98
+
```
99
+
100
+
#### Reasoning Model deployment configuration
101
+
102
+
##### DeepSeek-R1-Distill-Qwen-32B
103
+
```
104
+
? Select the model series: deepseek reasoning model
105
+
? Select the model name: DeepSeek-R1-Distill-Qwen-32B
106
+
? Select the service for deployment: Local
107
+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
108
+
engine type: vllm
109
+
framework type: fastapi
110
+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--enable-reasoning --reasoning-parser deepseek_r1 --max_model_len 16000 --disable-log-stats --chat-template emd/models/chat_templates/deepseek_r1_distill.jinja --max_num_seq 20 --gpu_memory_utilization 0.9"}}
111
+
```
112
+
113
+
##### deepseek-r1-distill-llama-70b-awq
114
+
115
+
```
116
+
? Select the model series: deepseek reasoning model
117
+
? Select the model name: deepseek-r1-distill-llama-70b-awq
118
+
? Select the service for deployment: Local
119
+
? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
120
+
? Select the inference engine to use: tgi
121
+
framework type: fastapi
122
+
? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
0 commit comments