docs: add best practice (#60)

AoyuQC · web-flow · commit d493c0dde7c6 · 2025-03-05T09:28:42.000+08:00
diff --git a/README.md b/README.md
@@ -20,10 +20,6 @@ Easy Model Deployer is a lightweight tool designed to simplify the deployment of
 
 ![cli](docs/images/cli.gif)
 
-**Supported Models**
-
-For a detailed list of supported models, please refer to [Supported Models](docs/en/supported_models.md)
-
 **Key Features**
 - One-click deployment of models to the cloud (Amazon SageMaker, Amazon ECS, Amazon EC2)
 - Diverse model types (LLMs, VLMs, Embeddings, Vision, etc.)
@@ -76,9 +72,13 @@ Deploy models with an interactive CLI or one command.
 emd deploy
 ```
 
-> **💡 Tip** To view all available parameters, run `emd deploy --help`.
-> When you see the message "Waiting for model: ...", it means the deployment task has started and you can stop the terminal output by `Ctrl+C`.
-> For more information on deployment parameters, please refer to the [Deployment parameters](docs/en/deployment.md).
+![deploy](docs/images/emd-deploy.png)
+
+
+> **Note:** To view all available parameters, run `emd deploy --help`.
+> When you see the message "Waiting for model: ...", it means the deployment task has started and you can stop the terminal output by pressing `Ctrl+C`.
+> For more information on deployment parameters, please refer to the [Deployment parameters](docs/en/installation.md).
+> For best practice examples of using command line parameters, please refer to the [Best Deployment Practices](docs/en/best_deployment_practices.md).
 
 ### Show Status
 
diff --git a/docs/en/best_deployment_practices.md b/docs/en/best_deployment_practices.md
@@ -0,0 +1,55 @@
+
+# Best Deployment Practices
+
+This document provides examples of best practices for deploying models using EMD for various use cases.
+
+
+## Deploying to Specific GPU Types
+
+
+Choosing the right GPU type is critical for optimal performance and cost-efficiency. Use the `--instance-type` parameter to specify the GPU instance.
+
+
+### Example: Deploying Qwen2.5-7B on g5.2xlarge
+
+```bash
+emd deploy --model-id Qwen2.5-7B-Instruct --instance-type g5.2xlarge --engine-type vllm --service-type sagemaker
+```
+
+
+## Achieving Longer Context Windows
+
+
+To enable longer context windows, use the `--extra-params` option with engine-specific parameters.
+
+
+### Example: Deploying model with 16k context window
+
+```bash
+emd deploy --model-id Qwen2.5-7B-Instruct --instance-type g5.4xlarge --engine-type vllm --service-type sagemaker --extra-params '{
+  "engine_params": {
+    "vllm_cli_args": "--max_model_len 16000 --max_num_seqs 4"
+  }
+}'
+```
+
+### Example: Deploying model on G4dn instance
+
+```bash
+emd deploy --model-id Qwen2.5-14B-Instruct-AWQ --instance-type g4dn.2xlarge --engine-type vllm --service-type sagemaker --extra-params '{
+  "engine_params": {
+    "environment_variables": "export VLLM_ATTENTION_BACKEND=XFORMERS && export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True",
+    "default_cli_args": " --chat-template emd/models/chat_templates/qwen_2d5_add_prefill_chat_template.jinja --max_model_len 12000 --max_num_seqs 10  --gpu_memory_utilization 0.95 --disable-log-stats --enable-auto-tool-choice --tool-call-parser hermes"
+  }
+}'
+```
+
+
+
+## Common Troubleshooting
+
+If your deployment fails due to out-of-memory issues, try:
+
+- Using a larger instance type
+- Reducing max_model_len and max_num_seqs in the engine parameters
+- Setting a lower gpu_memory_utilization value (e.g., 0.8 instead of the default)
diff --git a/docs/en/deployment.md b/docs/en/deployment.md
diff --git a/docs/en/installation.md b/docs/en/installation.md
@@ -1,10 +1,10 @@
-# Installation Guide
+## Installation Guide
 
-## Prerequisites
+### Prerequisites
 - Python 3.9 or higher
 - pip (Python package installer)
 
-## Setting up the Environment
+### Setting up the Environment
 
 1. Create a virtual environment:
 ```bash
@@ -20,3 +20,106 @@ source emd-env/bin/activate
 ```bash
 pip install https://github.com/aws-samples/easy-model-deployer/releases/download/main/emd-0.6.0-py3-none-any.whl
 ```
+
+
+## Deployment parameters
+
+### --force-update-env-stack
+No additional ```emd bootstrap``` required for deployment. Because of other commands, status/destroy etc. require pre-bootstrapping. Therefore, it is recommended to run ```emd bootstrap``` separately after each upgrade.
+
+### --extra-params
+Extra parameters passed to the model deployment. extra-params should be a Json object of dictionary format as follows:
+
+```json
+{
+
+  "model_params": {
+  },
+  "service_params":{
+  },
+  "instance_params":{
+  },
+  "engine_params":{
+      "cli_args": "<command line arguments of current engine>",
+      "api_key":"<api key>"
+  },
+  "framework_params":{
+      "uvicorn_log_level":"info",
+      "limit_concurrency":200
+  }
+}
+```
+To learn some practice examples, please refer to the [Best Deployment Practices](docs/en/best_deployment_practices.md).
+
+
+
+## Local deployment on the ec2 instance
+
+This is suitable for deploying models using local GPU resources.
+
+### Pre-requisites
+
+#### Start and connect to EC2 instance
+
+It is recommended to launch the instance using the AMI "**Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.6 (Ubuntu 22.04)**".
+
+
+### Deploy model using EMD
+
+```sh
+emd deploy --allow-local-deploy
+```
+
+There some EMD configuration sample settings for model deployment in the following two sections: [Non-reasoning Model deployment configuration](#non-reasoning-model-deployment-configuration) and [Reasoning Model deployment configuration](#reasoning-model-deployment-configuration).
+Wait for the model deployment to complete.
+
+#### Non-reasoning Model deployment configuration
+
+##### Qwen2.5-72B-Instruct-AWQ
+
+```
+? Select the model series: qwen2.5
+? Select the model name: Qwen2.5-72B-Instruct-AWQ
+? Select the service for deployment: Local
+? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
+? Select the inference engine to use: tgi
+? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
+```
+
+##### llama-3.3-70b-instruct-awq
+```
+? Select the model series: llama
+? Select the model name: llama-3.3-70b-instruct-awq
+? Select the service for deployment: Local
+? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
+engine type: tgi
+framework type: fastapi
+? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
+```
+
+#### Reasoning Model deployment configuration
+
+##### DeepSeek-R1-Distill-Qwen-32B
+```
+? Select the model series: deepseek reasoning model
+? Select the model name: DeepSeek-R1-Distill-Qwen-32B
+? Select the service for deployment: Local
+? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
+engine type: vllm
+framework type: fastapi
+? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--enable-reasoning --reasoning-parser deepseek_r1 --max_model_len 16000 --disable-log-stats --chat-template emd/models/chat_templates/deepseek_r1_distill.jinja --max_num_seq 20 --gpu_memory_utilization 0.9"}}
+```
+
+##### deepseek-r1-distill-llama-70b-awq
+
+```
+? Select the model series: deepseek reasoning model
+? Select the model name: deepseek-r1-distill-llama-70b-awq
+? Select the service for deployment: Local
+? input the local gpu ids to deploy the model (e.g. 0,1,2): 0,1,2,3
+? Select the inference engine to use: tgi
+framework type: fastapi
+? (Optional) Additional deployment parameters (JSON string or local file path), you can skip by pressing Enter: {"engine_params":{"api_key":"<YOUR_API_KEY>", "default_cli_args": "--max-total-tokens 30000 --max-concurrent-requests 30"}}
+```
+
+## Examples
diff --git a/docs/mkdocs.en.yml b/docs/mkdocs.en.yml
@@ -6,7 +6,7 @@ theme:
 nav:
   - Architecture: architecture.md
   - Installation: installation.md
-  - Deployment: deployment.md
+  - Best-Deployment-Practices: best_deployment_practices.md
   - EMD-Client: emd_client.md
   - Langchain-Interface: langchain_interface.md
   - OpenAI-Compatiable: openai_compatiable.md