Skip to content

Commit 72970b3

Browse files
authored
Merge pull request #27 from Sherlock113/docs/update-structure
docs: Update structure
2 parents b05df63 + 2e80350 commit 72970b3

File tree

5 files changed

+16
-5
lines changed

5 files changed

+16
-5
lines changed
801 KB
Loading

docs/inference-optimization/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 3
2+
sidebar_position: 4
33
sidebar_custom_props:
44
icon: /img/speed.svg
55
---

docs/inference-optimization/llm-inference-metrics.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ keywords:
1010
---
1111

1212
import LinkList from '@site/src/components/LinkList';
13+
import Button from '@site/src/components/Button';
1314

1415
# Key metrics for LLM inference
1516

@@ -105,6 +106,16 @@ There are two common ways to measure throughput:
105106
- GPU memory bandwidth and compute utilization
106107
107108
As the number of concurrent requests increases, the total TPS also grows, until the LLM hits the saturation point of available compute resources. Beyond this point, performance might decrease because the LLM is over capacity.
109+
110+
---
111+
112+
At Bento, we offer deployment and inference optimization strategies tailored to your use case. You can easily leverage them to optimize for throughput, latency, or cost.
113+
114+
![bento-different-inference-optimizations.png](./img/bento-different-inference-optimizations.png)
115+
116+
<div style={{ margin: '3rem 0' }}>
117+
[<Button>Talk to us</Button>](https://l.bentoml.com/contact-us-llm-inference-handbook)
118+
</div>
108119
109120
## Goodput
110121

docs/infrastructure-and-operations/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 4
2+
sidebar_position: 3
33
sidebar_custom_props:
44
icon: /img/setting.svg
55
---

src/components/Chat/index.tsx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ function Chat() {
3838
</button>
3939
<h4>Talk to Us</h4>
4040
<p>
41-
At Bento, we're working to help enterprises leverage the latest
42-
advancements in LLM inference with ease. Have questions about LLM
43-
inference? Let's talk.
41+
At Bento, we help customers build custom LLM serving solutions
42+
tailored for speed, quality, or cost. Schedule a call to
43+
learn how we make it easy to apply advanced inference optimizations to your use case.
4444
</p>
4545
<div>
4646
<a

0 commit comments

Comments
 (0)