Skip to content

Commit 2e80350

Browse files
committed
Update structure
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
1 parent b05df63 commit 2e80350

File tree

5 files changed

+16
-5
lines changed

5 files changed

+16
-5
lines changed
801 KB
Loading

docs/inference-optimization/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 3
2+
sidebar_position: 4
33
sidebar_custom_props:
44
icon: /img/speed.svg
55
---

docs/inference-optimization/llm-inference-metrics.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ keywords:
1010
---
1111

1212
import LinkList from '@site/src/components/LinkList';
13+
import Button from '@site/src/components/Button';
1314

1415
# Key metrics for LLM inference
1516

@@ -105,6 +106,16 @@ There are two common ways to measure throughput:
105106
- GPU memory bandwidth and compute utilization
106107
107108
As the number of concurrent requests increases, the total TPS also grows, until the LLM hits the saturation point of available compute resources. Beyond this point, performance might decrease because the LLM is over capacity.
109+
110+
---
111+
112+
At Bento, we offer deployment and inference optimization strategies tailored to your use case. You can easily leverage them to optimize for throughput, latency, or cost.
113+
114+
![bento-different-inference-optimizations.png](./img/bento-different-inference-optimizations.png)
115+
116+
<div style={{ margin: '3rem 0' }}>
117+
[<Button>Talk to us</Button>](https://l.bentoml.com/contact-us-llm-inference-handbook)
118+
</div>
108119
109120
## Goodput
110121

docs/infrastructure-and-operations/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 4
2+
sidebar_position: 3
33
sidebar_custom_props:
44
icon: /img/setting.svg
55
---

src/components/Chat/index.tsx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ function Chat() {
3838
</button>
3939
<h4>Talk to Us</h4>
4040
<p>
41-
At Bento, we're working to help enterprises leverage the latest
42-
advancements in LLM inference with ease. Have questions about LLM
43-
inference? Let's talk.
41+
At Bento, we help customers build custom LLM serving solutions
42+
tailored for speed, quality, or cost. Schedule a call to
43+
learn how we make it easy to apply advanced inference optimizations to your use case.
4444
</p>
4545
<div>
4646
<a

0 commit comments

Comments
 (0)