Skip to content

Commit c6dfeb0

Browse files
Changes to comps/llms/text-generation/README.md (opea-project#783)
* Removed redundant commands Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com> * Changes to README and minor code improvements Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes added per review Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com> --------- Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent a719b61 commit c6dfeb0

File tree

7 files changed

+161
-119
lines changed

7 files changed

+161
-119
lines changed

comps/llms/text-generation/README.md

Lines changed: 145 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -27,219 +27,254 @@ git clone https://github.com/opea-project/GenAIComps.git
2727
export OPEA_GENAICOMPS_ROOT=$(pwd)/GenAIComps
2828
```
2929

30+
## Prerequisites
31+
32+
You must create a user account with [HuggingFace] and obtain permission to use the gated LLM models by adhering to the guidelines provided on the respective model's webpage. The environment variables `LLM_MODEL` would be the HuggingFace model id and the `HF_TOKEN` is your HuggugFace account's "User Access Token".
33+
3034
## 🚀1. Start Microservice with Python (Option 1)
3135

3236
To start the LLM microservice, you need to install python packages first.
3337

3438
### 1.1 Install Requirements
3539

3640
```bash
41+
# Install opea-comps
3742
pip install opea-comps
38-
pip install -r ${OPEA_GENAICOMPS_ROOT}/comps/llms/requirements.txt
39-
40-
# Install requirements of your choice of microservice in the text-generation folder (tgi, vllm, vllm-ray, etc.)
41-
export MICROSERVICE_DIR=your_chosen_microservice
42-
43-
pip install -r ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/${MICROSERVICE_DIR}/requirements.txt
44-
```
4543

46-
Set an environment variable `your_ip` to the IP address of the machine where you would like to consume the microservice.
44+
# Install requirements from comps/llms
45+
cd ${OPEA_GENAICOMPS_ROOT}/comps/llms
4746

48-
```bash
49-
# For example, this command would set the IP address of your currently logged-in machine.
50-
export your_ip=$(hostname -I | awk '{print $1}')
47+
pip install -r requirements.txt
5148
```
5249

5350
### 1.2 Start LLM Service with Python Script
5451

5552
#### 1.2.1 Start the TGI Service
5653

57-
```bash
58-
export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
59-
python ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/tgi/llm.py
60-
python ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/tgi/llm.py
61-
```
62-
63-
#### 1.2.2 Start the vLLM Service
54+
Install the requirements for TGI Service
6455

6556
```bash
66-
export vLLM_LLM_ENDPOINT="http://${your_ip}:8008"
67-
python ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/llm.py
68-
python ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/llm.py
69-
```
70-
71-
#### 1.2.3 Start the Ray Service
57+
cd ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/tgi
7258

73-
```bash
74-
export RAY_Serve_ENDPOINT="http://${your_ip}:8008"
75-
python ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/ray_serve/llm.py
76-
python ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/ray_serve/llm.py
59+
pip install -r requirements.txt
7760
```
7861

79-
## 🚀2. Start Microservice with Docker (Option 2)
80-
81-
You can use either a published docker image or build your own docker image with the respective microservice Dockerfile of your choice. You must create a user account with [HuggingFace] and obtain permission to use the restricted LLM models by adhering to the guidelines provided on the respective model's webpage.
82-
83-
### 2.1 Start LLM Service with published image
84-
85-
#### 2.1.1 Start TGI Service
62+
Execute the docker run command to initiate the backend, along with the Python script that launches the microservice.
8663

8764
```bash
88-
export HF_LLM_MODEL=${your_hf_llm_model}
65+
export TGI_HOST_IP=$(hostname -I | awk '{print $1}') # This sets IP of the current machine
66+
export LLM_MODEL=${your_hf_llm_model}
67+
export DATA_DIR=$HOME/data # Location to download the model
8968
export HF_TOKEN=${your_hf_api_token}
9069

91-
docker run \
70+
# Initiate the backend
71+
docker run -d \
9272
-p 8008:80 \
9373
-e HF_TOKEN=${HF_TOKEN} \
94-
-v ./data:/data \
74+
-v ${DATA_DIR}:/data \
9575
--name tgi_service \
9676
--shm-size 1g \
9777
ghcr.io/huggingface/text-generation-inference:1.4 \
98-
--model-id ${HF_LLM_MODEL}
78+
--model-id ${LLM_MODEL}
79+
80+
# Start the microservice with an endpoint as the above docker run command
81+
export TGI_LLM_ENDPOINT="http://${TGI_HOST_IP}:8008"
82+
83+
python llm.py
9984
```
10085

101-
#### 2.1.2 Start vLLM Service
86+
#### 1.2.2 Start the vLLM Service
87+
88+
Install the requirements for vLLM Service
10289

10390
```bash
104-
# Use the script to build the docker image as opea/vllm:cpu
105-
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/build_docker_vllm.sh cpu
91+
cd ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/langchain
92+
93+
pip install -r requirements.txt
94+
```
10695

107-
export HF_LLM_MODEL=${your_hf_llm_model}
96+
Execute the docker run command to initiate the backend, along with the Python script that launches the microservice.
97+
98+
```bash
99+
export vLLM_HOST_IP=$(hostname -I | awk '{print $1}') # This sets IP of the current machine
100+
export LLM_MODEL=${your_hf_llm_model}
101+
export DATA_DIR=$HOME/data # Location to download the model
108102
export HF_TOKEN=${your_hf_api_token}
109103

110-
docker run -it \
104+
# Build the image first as opea/vllm:cpu
105+
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/langchain/dependency/build_docker_vllm.sh cpu
106+
107+
# Initiate the backend
108+
docker run -d -it \
111109
--name vllm_service \
112110
-p 8008:80 \
113111
-e HF_TOKEN=${HF_TOKEN} \
114112
-e VLLM_CPU_KVCACHE_SPACE=40 \
115-
-v ./data:/data \
113+
-v ${DATA_DIR}:/data \
116114
opea/vllm:cpu \
117-
--model ${HF_LLM_MODEL}
115+
--model ${LLM_MODEL} \
118116
--port 80
117+
118+
# Start the microservice with an endpoint as the above docker run command
119+
export vLLM_ENDPOINT="http://${vLLM_HOST_IP}:8008"
120+
121+
python llm.py
122+
```
123+
124+
#### 1.2.3 Start the Ray Service
125+
126+
Install the requirements for Ray Service
127+
128+
```bash
129+
cd ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray
130+
131+
pip install -r requirements.txt
119132
```
120133

121-
#### 2.1.3 Start Ray Service
134+
Execute the docker run command to initiate the backend, along with the Python script that launches the microservice.
122135

123136
```bash
124-
export HF_LLM_MODEL=${your_hf_llm_model}
125-
export HF_CHAT_PROCESSOR=${your_hf_chatprocessor}
137+
export vLLM_RAY_HOST_IP=$(hostname -I | awk '{print $1}') # This sets IP of the current machine
138+
export LLM_MODEL=${your_hf_llm_model}
139+
export DATA_DIR=$HOME/data # Location to download the model
126140
export HF_TOKEN=${your_hf_api_token}
127-
export TRUST_REMOTE_CODE=True
128141

129-
docker run -it \
142+
# Build the image first as opea/vllm:cpu
143+
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray/dependency/build_docker_vllmray.sh
144+
145+
# Initiate the backend
146+
docker run \
147+
--name="vllm-ray-service" \
130148
--runtime=habana \
131-
--name ray_serve_service \
149+
-v $DATA_DIR:/data \
150+
-e HABANA_VISIBLE_DEVICES=all \
132151
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
133152
--cap-add=sys_nice \
134153
--ipc=host \
135-
-p 8008:80 \
154+
-p 8006:8000 \
136155
-e HF_TOKEN=$HF_TOKEN \
137-
-e TRUST_REMOTE_CODE=$TRUST_REMOTE_CODE \
138-
opea/llm-ray:latest \
156+
opea/vllm_ray:habana \
139157
/bin/bash -c " \
140158
ray start --head && \
141-
python api_server_openai.py \
142-
--port_number 80 \
143-
--model_id_or_path ${HF_LLM_MODEL} \
144-
--chat_processor ${HF_CHAT_PROCESSOR}"
145-
```
146-
147-
### 2.2 Start LLM Service with image built from source
148-
149-
If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a TGI/vLLM service with docker.
150-
151-
#### 2.2.1 Setup Environment Variables
159+
python vllm_ray_openai.py \
160+
--port_number 8000 \
161+
--model_id_or_path $LLM_MODEL \
162+
--tensor_parallel_size 2 \
163+
--enforce_eager False"
152164

153-
In order to start TGI and LLM services, you need to setup the following environment variables first.
165+
# Start the microservice with an endpoint as the above docker run command
166+
export vLLM_RAY_ENDPOINT="http://${vLLM_RAY_HOST_IP}:8006"
154167

155-
```bash
156-
export HF_TOKEN=${your_hf_api_token}
157-
export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
158-
export LLM_MODEL_ID=${your_hf_llm_model}
168+
python llm.py
159169
```
160170

161-
In order to start vLLM and LLM services, you need to setup the following environment variables first.
162-
163-
```bash
164-
export HF_TOKEN=${your_hf_api_token}
165-
export vLLM_LLM_ENDPOINT="http://${your_ip}:8008"
166-
export LLM_MODEL_ID=${your_hf_llm_model}
167-
```
168-
169-
In order to start Ray serve and LLM services, you need to setup the following environment variables first.
171+
## 🚀2. Start Microservice with Docker (Option 2)
170172

171-
```bash
172-
export HF_TOKEN=${your_hf_api_token}
173-
export RAY_Serve_ENDPOINT="http://${your_ip}:8008"
174-
export LLM_MODEL=${your_hf_llm_model}
175-
export CHAT_PROCESSOR="ChatModelLlama"
176-
```
173+
In order to start the microservices with docker, you need to build the docker images first for the microservice.
177174

178-
### 2.2 Build Docker Image
175+
### 2.1 Build Docker Image
179176

180-
#### 2.2.1 TGI
177+
#### 2.1.1 TGI
181178

182179
```bash
180+
# Build the microservice docker
183181
cd ${OPEA_GENAICOMPS_ROOT}
184182

185183
docker build \
186-
-t opea/llm-tgi:latest \
187184
--build-arg https_proxy=$https_proxy \
188185
--build-arg http_proxy=$http_proxy \
186+
-t opea/llm-tgi:latest \
189187
-f comps/llms/text-generation/tgi/Dockerfile .
190188
```
191189

192-
#### 2.2.2 vLLM
193-
194-
Build vllm docker.
190+
#### 2.1.2 vLLM
195191

196192
```bash
197-
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/langchain/dependency/build_docker_vllm.sh
198-
```
199-
200-
Build microservice docker.
193+
# Build vllm docker
194+
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/langchain/dependency/build_docker_vllm.sh hpu
201195

202-
```bash
196+
# Build the microservice docker
203197
cd ${OPEA_GENAICOMPS_ROOT}
204198

205199
docker build \
206-
-t opea/llm-vllm:latest \
207200
--build-arg https_proxy=$https_proxy \
208201
--build-arg http_proxy=$http_proxy \
202+
-t opea/llm-vllm:latest \
209203
-f comps/llms/text-generation/vllm/langchain/Dockerfile .
210204
```
211205

212-
#### 2.2.3 Ray Serve
213-
214-
Build Ray Serve docker.
206+
#### 2.1.3 Ray
215207

216208
```bash
209+
# Build the Ray Serve docker
217210
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray/dependency/build_docker_vllmray.sh
218-
```
219-
220-
Build microservice docker.
221211

222-
```bash
212+
# Build the microservice docker
223213
cd ${OPEA_GENAICOMPS_ROOT}
224214

225215
docker build \
226-
-t opea/llm-ray:latest \
227216
--build-arg https_proxy=$https_proxy \
228217
--build-arg http_proxy=$http_proxy \
218+
-t opea/llm-vllm-ray:latest \
229219
-f comps/llms/text-generation/vllm/ray/Dockerfile .
230220
```
231221

222+
### 2.2 Start LLM Service with the built image
223+
232224
To start a docker container, you have two options:
233225

234226
- A. Run Docker with CLI
235227
- B. Run Docker with Docker Compose
236228

237-
You can choose one as needed.
229+
You can choose one as needed. If you start an LLM microservice with docker compose, the `docker_compose_llm.yaml` file will automatically start both endpoint and the microservice docker.
230+
231+
#### 2.2.1 Setup Environment Variables
232+
233+
In order to start TGI and LLM services, you need to setup the following environment variables first.
234+
235+
```bash
236+
export HF_TOKEN=${your_hf_api_token}
237+
export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
238+
export LLM_MODEL=${your_hf_llm_model}
239+
export DATA_DIR=$HOME/data
240+
```
241+
242+
In order to start vLLM and LLM services, you need to setup the following environment variables first.
243+
244+
```bash
245+
export HF_TOKEN=${your_hf_api_token}
246+
export vLLM_LLM_ENDPOINT="http://${your_ip}:8008"
247+
export LLM_MODEL=${your_hf_llm_model}
248+
```
249+
250+
In order to start Ray serve and LLM services, you need to setup the following environment variables first.
251+
252+
```bash
253+
export HF_TOKEN=${your_hf_api_token}
254+
export RAY_Serve_ENDPOINT="http://${your_ip}:8008"
255+
export LLM_MODEL=${your_hf_llm_model}
256+
export CHAT_PROCESSOR="ChatModelLlama"
257+
```
238258

239259
### 2.3 Run Docker with CLI (Option A)
240260

241261
#### 2.3.1 TGI
242262

263+
Start TGI endpoint.
264+
265+
```bash
266+
docker run -d \
267+
-p 8008:80 \
268+
-e HF_TOKEN=${HF_TOKEN} \
269+
-v ${DATA_DIR}:/data \
270+
--name tgi_service \
271+
--shm-size 1g \
272+
ghcr.io/huggingface/text-generation-inference:1.4 \
273+
--model-id ${LLM_MODEL}
274+
```
275+
276+
Start TGI microservice
277+
243278
```bash
244279
docker run -d \
245280
--name="llm-tgi-server" \
@@ -272,7 +307,7 @@ docker run \
272307
-e no_proxy=${no_proxy} \
273308
-e vLLM_LLM_ENDPOINT=$vLLM_LLM_ENDPOINT \
274309
-e HF_TOKEN=$HF_TOKEN \
275-
-e LLM_MODEL_ID=$LLM_MODEL_ID \
310+
-e LLM_MODEL=$LLM_MODEL \
276311
opea/llm-vllm:latest
277312
```
278313

@@ -365,10 +400,10 @@ curl http://${your_ip}:8008/v1/chat/completions \
365400
"model": ${your_hf_llm_model},
366401
"messages": [
367402
{"role": "assistant", "content": "You are a helpful assistant."},
368-
{"role": "user", "content": "What is Deep Learning?"},
403+
{"role": "user", "content": "What is Deep Learning?"}
369404
],
370405
"max_tokens": 32,
371-
"stream": True
406+
"stream": true
372407
}'
373408
```
374409

comps/llms/text-generation/vllm/langchain/dependency/build_docker_vllm.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,6 @@ else
3535
git clone https://github.com/vllm-project/vllm.git
3636
cd ./vllm/
3737
docker build -f Dockerfile.cpu -t opea/vllm:cpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
38+
cd ..
39+
rm -rf vllm
3840
fi

0 commit comments

Comments
 (0)