You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Peformance Monitoring Worfklows with Judgment
3
+
---
4
+
5
+
## Overview ##
6
+
`judgeval` contains a suite of monitoring tools that allow you to **measure the quality of your LLM applications in production** scenarios.
7
+
8
+
Using `judgeval` in production, you can:
9
+
- Measure the quality of your LLM agent systems in **real time** using Judgment's **10+ researched-backed scoring metrics**.
10
+
- Check for regressions in **retrieval quality, hallucinations, and any other scoring metric you care about**.
11
+
- Measure token usage
12
+
- Track latency of different system components (web searching, LLM generation, etc.)
13
+
14
+
<Tip>
15
+
**Why evaluate your system in production?**
16
+
17
+
Production data **provides the highest signal** for improving your LLM system on use cases you care about.
18
+
Judgment Labs' infrastructure enables LLM teams to **capture quality signals from production use cases** and
19
+
provides [**actionable insights**](/monitoring/production_insights) for improving any component of your system.
20
+
</Tip>
21
+
22
+
23
+
## Standard Setup ##
24
+
A typical setup of `judgeval` on production systems involves:
25
+
- Tracing your application using `judgeval`'s [tracing module](/monitoring/tracing).
26
+
- Embedding evaluation runs into your traces the `async_evaluate()` function.
27
+
- Tracking your LLM agent's performance in real time using the [Judgment platform](/judgment/introduction).
28
+
29
+
30
+
For a full example of how to set up `judgeval` in a production system, see our [OpenAI Travel Agent example](/monitoring/tracing#example-openai-travel-agent).
0 commit comments