Skip to content

Commit 2b38af8

Browse files
committed
🤖 Auto-update model configuration
- Generated from EMD Python model definitions - Updated at: $(date -u '+%Y-%m-%d %H:%M:%S UTC') - Triggered by: workflow_dispatch - Source commit: 937cac4 937cac4
1 parent 866896f commit 2b38af8

28 files changed

+6472
-0
lines changed

en/api.md

Lines changed: 381 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,381 @@
1+
# API Documentation
2+
3+
> **Getting Started**: To obtain the base URL and API key for your deployed models, run `emd status` in your terminal. The command will display a table with your deployed models and their details, including a link to retrieve the API key from AWS Secrets Manager. The base URL is shown at the bottom of the output.
4+
>
5+
> Example output:
6+
> ```
7+
> Models
8+
> ┌────────────────────────┬───────────────────────────────────────────────────────────────────────┐
9+
> │ Model ID │ Qwen2.5-0.5B-Instruct/dev │
10+
> │ Status │ CREATE_COMPLETE │
11+
> │ Service Type │ Amazon SageMaker AI Real-time inference with OpenAI Compatible API │
12+
> │ Instance Type │ ml.g5.2xlarge │
13+
> │ Create Time │ 2025-05-08 12:27:05 UTC │
14+
> │ Query Model API Key │ https://console.aws.amazon.com/secretsmanager/secret?name=EMD-APIKey- │
15+
> │ │ Secrets&region=us-east-1 │
16+
> │ SageMakerEndpointName │ EMD-Model-qwen2-5-0-5b-instruct-endpoint │
17+
> └────────────────────────┴───────────────────────────────────────────────────────────────────────┘
18+
>
19+
> Base URL
20+
> http://your-emd-endpoint.region.elb.amazonaws.com/v1
21+
> ```
22+
23+
## List Models
24+
25+
Returns a list of available models.
26+
27+
**Endpoint:** `GET /v1/models`
28+
29+
**Curl Example:**
30+
```bash
31+
curl https://BASE_URL/v1/models
32+
```
33+
34+
**Python Example:**
35+
```python
36+
from openai import OpenAI
37+
38+
client = OpenAI(
39+
# No API key needed for listing models
40+
base_url="https://BASE_URL"
41+
)
42+
43+
# List available models
44+
models = client.models.list()
45+
for model in models.data:
46+
print(model.id)
47+
```
48+
49+
## Chat Completions
50+
51+
Create a model response for a conversation.
52+
53+
**Endpoint:** `POST /v1/chat/completions`
54+
55+
**Parameters:**
56+
57+
- `model` (required): ID of the model to use (e.g., "Qwen2.5-7B-Instruct/dev", "Llama-3.3-70B-Instruct/dev")
58+
- `messages` (required): Array of message objects with `role` and `content`
59+
- `temperature`: Sampling temperature (0-2, default: 1)
60+
- `top_p`: Nucleus sampling parameter (0-1, default: 1)
61+
- `n`: Number of chat completion choices to generate (default: 1)
62+
- `stream`: Whether to stream partial progress (default: false)
63+
- `stop`: Sequences where the API will stop generating
64+
- `max_tokens`: Maximum number of tokens to generate
65+
- `presence_penalty`: Penalty for new tokens based on presence (-2.0 to 2.0)
66+
- `frequency_penalty`: Penalty for new tokens based on frequency (-2.0 to 2.0)
67+
- `function_call`: Controls how the model responds to function calls
68+
- `functions`: List of functions the model may generate JSON inputs for
69+
70+
**Curl Example:**
71+
```bash
72+
curl https://BASE_URL/v1/chat/completions \
73+
-H "Authorization: Bearer YOUR_API_KEY" \
74+
-H "Content-Type: application/json" \
75+
-d '{
76+
"model": "Qwen2.5-7B-Instruct/dev",
77+
"messages": [
78+
{"role": "system", "content": "You are a helpful assistant."},
79+
{"role": "user", "content": "Hello!"}
80+
],
81+
"temperature": 0.7
82+
}'
83+
```
84+
85+
**Python Example:**
86+
```python
87+
from openai import OpenAI
88+
89+
client = OpenAI(
90+
api_key="YOUR_API_KEY",
91+
base_url="https://BASE_URL"
92+
)
93+
94+
# Create a chat completion
95+
response = client.chat.completions.create(
96+
model="Qwen2.5-7B-Instruct/dev", # Model ID with tag
97+
messages=[
98+
{"role": "system", "content": "You are a helpful assistant."},
99+
{"role": "user", "content": "Hello!"}
100+
],
101+
temperature=0.7,
102+
stream=False
103+
)
104+
105+
# Print the response
106+
print(response.choices[0].message.content)
107+
```
108+
109+
**Streaming Example:**
110+
```python
111+
from openai import OpenAI
112+
113+
client = OpenAI(
114+
api_key="YOUR_API_KEY",
115+
base_url="https://BASE_URL"
116+
)
117+
118+
# Create a streaming chat completion
119+
stream = client.chat.completions.create(
120+
model="Llama-3.3-70B-Instruct/dev", # Model ID with tag
121+
messages=[
122+
{"role": "system", "content": "You are a helpful assistant."},
123+
{"role": "user", "content": "Write a short poem about AI."}
124+
],
125+
stream=True
126+
)
127+
128+
# Process the stream
129+
for chunk in stream:
130+
if chunk.choices[0].delta.content is not None:
131+
print(chunk.choices[0].delta.content, end="")
132+
print()
133+
```
134+
135+
## Embeddings
136+
137+
> Some embedding models may have additional parameters or usage guidelines specified in their official documentation. For model-specific details, please refer to the provider's documentation.
138+
139+
Get vector representations of text.
140+
141+
**Endpoint:** `POST /v1/embeddings`
142+
143+
**Parameters:**
144+
145+
- `model` (required): ID of the model to use (e.g., "bge-m3/dev")
146+
- `input` (required): Input text to embed or array of texts
147+
- `user`: A unique identifier for the end-user
148+
149+
**Curl Example:**
150+
```bash
151+
curl https://BASE_URL/v1/embeddings \
152+
-H "Authorization: Bearer YOUR_API_KEY" \
153+
-H "Content-Type: application/json" \
154+
-d '{
155+
"model": "bge-m3/dev",
156+
"input": "The food was delicious and the waiter..."
157+
}'
158+
```
159+
160+
**Python Example:**
161+
```python
162+
from openai import OpenAI
163+
164+
client = OpenAI(
165+
api_key="YOUR_API_KEY",
166+
base_url="https://BASE_URL"
167+
)
168+
169+
# Get embeddings for a single text
170+
response = client.embeddings.create(
171+
model="bge-m3/dev", # Embedding model ID with tag
172+
input="The food was delicious and the service was excellent."
173+
)
174+
175+
# Print the embedding vector
176+
print(response.data[0].embedding)
177+
178+
# Get embeddings for multiple texts
179+
response = client.embeddings.create(
180+
model="bge-m3/dev", # Embedding model ID with tag
181+
input=[
182+
"The food was delicious and the service was excellent.",
183+
"The restaurant was very expensive and the food was mediocre."
184+
]
185+
)
186+
187+
# Print the number of embeddings
188+
print(f"Generated {len(response.data)} embeddings")
189+
```
190+
191+
## Rerank
192+
193+
> Some reranking models may have additional parameters or usage guidelines specified in their official documentation. For model-specific details, please refer to the provider's documentation.
194+
195+
Rerank a list of documents based on their relevance to a query.
196+
197+
**Endpoint:** `POST /v1/rerank`
198+
199+
**Parameters:**
200+
201+
- `model` (required): ID of the model to use (e.g., "bge-reranker-v2-m3/dev")
202+
- `query` (required): The search query
203+
- `documents` (required): List of documents to rerank
204+
- `max_rerank`: Maximum number of documents to rerank (default: all)
205+
- `return_metadata`: Whether to return metadata (default: false)
206+
207+
**Curl Example:**
208+
```bash
209+
curl https://BASE_URL/v1/rerank \
210+
-H "Authorization: Bearer YOUR_API_KEY" \
211+
-H "Content-Type: application/json" \
212+
-d '{
213+
"model": "bge-reranker-v2-m3/dev",
214+
"query": "What is the capital of France?",
215+
"documents": [
216+
"Paris is the capital of France.",
217+
"Berlin is the capital of Germany.",
218+
"London is the capital of England."
219+
]
220+
}'
221+
```
222+
223+
**Python Example:**
224+
```python
225+
from openai import OpenAI
226+
227+
client = OpenAI(
228+
api_key="YOUR_API_KEY",
229+
base_url="https://BASE_URL"
230+
)
231+
232+
# Rerank documents based on a query
233+
response = client.reranking.create(
234+
model="bge-reranker-v2-m3/dev", # Reranking model ID with tag
235+
query="What is the capital of France?",
236+
documents=[
237+
"Paris is the capital of France.",
238+
"Berlin is the capital of Germany.",
239+
"London is the capital of England."
240+
],
241+
max_rerank=3
242+
)
243+
244+
# Print the reranked documents
245+
for result in response.data:
246+
print(f"Document: {result.document}")
247+
print(f"Relevance Score: {result.relevance_score}")
248+
print("---")
249+
```
250+
251+
## Invocations
252+
253+
General-purpose endpoint for model invocations.
254+
255+
**Endpoint:** `POST /v1/invocations`
256+
257+
**Parameters:**
258+
259+
- `model` (required): ID of the model to use
260+
- `input`: Input data for the model
261+
- `parameters`: Additional parameters for the model
262+
263+
**Curl Example:**
264+
```bash
265+
curl https://BASE_URL/v1/invocations \
266+
-H "Authorization: Bearer YOUR_API_KEY" \
267+
-H "Content-Type: application/json" \
268+
-d '{
269+
"model": "Qwen2.5-7B-Instruct/dev",
270+
"input": {
271+
"query": "What is machine learning?"
272+
},
273+
"parameters": {
274+
"max_tokens": 100
275+
}
276+
}'
277+
```
278+
279+
**Python Example:**
280+
```python
281+
import requests
282+
import json
283+
284+
# Set up the API endpoint and headers
285+
url = "https://BASE_URL/v1/invocations"
286+
headers = {
287+
"Authorization": "Bearer YOUR_API_KEY",
288+
"Content-Type": "application/json"
289+
}
290+
291+
# Prepare the payload
292+
payload = {
293+
"model": "Qwen2.5-7B-Instruct/dev", # Model ID with tag
294+
"input": {
295+
"query": "What is machine learning?"
296+
},
297+
"parameters": {
298+
"max_tokens": 100
299+
}
300+
}
301+
302+
# Make the API call
303+
response = requests.post(url, headers=headers, data=json.dumps(payload))
304+
305+
# Print the response
306+
print(response.json())
307+
```
308+
309+
## Vision Models
310+
311+
Process images along with text prompts.
312+
313+
**Endpoint:** `POST /v1/chat/completions`
314+
315+
**Parameters:**
316+
Same as Chat Completions, but with messages that include image content.
317+
318+
**Python Example:**
319+
```python
320+
from openai import OpenAI
321+
import base64
322+
323+
# Function to encode the image
324+
def encode_image(image_path):
325+
with open(image_path, "rb") as image_file:
326+
return base64.b64encode(image_file.read()).decode('utf-8')
327+
328+
# Path to your image
329+
image_path = "path/to/your/image.jpg"
330+
base64_image = encode_image(image_path)
331+
332+
client = OpenAI(
333+
api_key="YOUR_API_KEY",
334+
base_url="https://BASE_URL"
335+
)
336+
337+
response = client.chat.completions.create(
338+
model="Qwen2-VL-7B-Instruct/dev", # Vision model ID with tag
339+
messages=[
340+
{
341+
"role": "user",
342+
"content": [
343+
{"type": "text", "text": "What's in this image?"},
344+
{
345+
"type": "image_url",
346+
"image_url": {
347+
"url": f"data:image/jpeg;base64,{base64_image}"
348+
}
349+
}
350+
]
351+
}
352+
]
353+
)
354+
355+
print(response.choices[0].message.content)
356+
```
357+
358+
## Audio Transcription
359+
360+
Transcribe audio files to text.
361+
362+
**Endpoint:** `POST /v1/audio/transcriptions`
363+
364+
**Python Example:**
365+
```python
366+
from openai import OpenAI
367+
368+
client = OpenAI(
369+
api_key="YOUR_API_KEY",
370+
base_url="https://BASE_URL"
371+
)
372+
373+
audio_file_path = "path/to/audio.mp3"
374+
with open(audio_file_path, "rb") as audio_file:
375+
response = client.audio.transcriptions.create(
376+
model="whisper-large-v3/dev", # ASR model ID with tag
377+
file=audio_file
378+
)
379+
380+
print(response.text) # Transcribed text
381+
```

0 commit comments

Comments
 (0)