Skip to content

Commit 0c02053

Browse files
committed
Add introductory page in docs for monitoring
1 parent 0af7717 commit 0c02053

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

docs/monitoring/introduction.mdx

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: Peformance Monitoring Worfklows with Judgment
3+
---
4+
5+
## Overview ##
6+
`judgeval` contains a suite of monitoring tools that allow you to **measure the quality of your LLM applications in production** scenarios.
7+
8+
Using `judgeval` in production, you can:
9+
- Measure the quality of your LLM agent systems in **real time** using Judgment's **10+ researched-backed scoring metrics**.
10+
- Check for regressions in **retrieval quality, hallucinations, and any other scoring metric you care about**.
11+
- Measure token usage
12+
- Track latency of different system components (web searching, LLM generation, etc.)
13+
14+
<Tip>
15+
**Why evaluate your system in production?**
16+
17+
Production data **provides the highest signal** for improving your LLM system on use cases you care about.
18+
Judgment Labs' infrastructure enables LLM teams to **capture quality signals from production use cases** and
19+
provides [**actionable insights**](/monitoring/production_insights) for improving any component of your system.
20+
</Tip>
21+
22+
23+
## Standard Setup ##
24+
A typical setup of `judgeval` on production systems involves:
25+
- Tracing your application using `judgeval`'s [tracing module](/monitoring/tracing).
26+
- Embedding evaluation runs into your traces the `async_evaluate()` function.
27+
- Tracking your LLM agent's performance in real time using the [Judgment platform](/judgment/introduction).
28+
29+
30+
For a full example of how to set up `judgeval` in a production system, see our [OpenAI Travel Agent example](/monitoring/tracing#example-openai-travel-agent).

0 commit comments

Comments
 (0)