Skip to content

Commit d720006

Browse files
authored
Update README.md (#515)
1 parent 88b6509 commit d720006

File tree

1 file changed

+5
-45
lines changed

1 file changed

+5
-45
lines changed

README.md

Lines changed: 5 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
<br>
77
<div style="font-size: 1.5em;">
8-
Enable self-learning agents with traces, evals, and environment data.
8+
Enable self-learning agents with environment data and evals.
99
</div>
1010

1111
## [Docs](https://docs.judgmentlabs.ai/)[Judgment Cloud](https://app.judgmentlabs.ai/register)[Self-Host](https://docs.judgmentlabs.ai/documentation/self-hosting/get-started)[Landing Page](https://judgmentlabs.ai/)
@@ -22,11 +22,11 @@ We're hiring! Join us in our mission to enable self-learning agents by providing
2222

2323
</div>
2424

25-
Judgeval offers **open-source tooling** for tracing and evaluating autonomous, stateful agents. It **provides runtime data from agent-environment interactions** for continuous learning and self-improvement.
25+
Judgeval offers **open-source tooling** for evaluating autonomous, stateful agents. It **provides runtime data from agent-environment interactions** for continuous learning and self-improvement.
2626

2727
## 🎬 See Judgeval in Action
2828

29-
**[Multi-Agent System](https://github.com/JudgmentLabs/judgment-cookbook/tree/main/cookbooks/agents/multi-agent) with complete observability:** (1) A multi-agent system spawns agents to research topics on the internet. (2) With just **3 lines of code**, Judgeval traces every input/output + environment response across all agent tool calls for debugging. (3) After completion, (4) export all interaction data to enable further environment-specific learning and optimization.
29+
**[Multi-Agent System](https://github.com/JudgmentLabs/judgment-cookbook/tree/main/cookbooks/agents/multi-agent) with complete observability:** (1) A multi-agent system spawns agents to research topics on the internet. (2) With just **3 lines of code**, Judgeval captures all environment responses across all agent tool calls for monitoring. (3) After completion, (4) export all interaction data to enable further environment-specific learning and optimization.
3030

3131
<table style="width: 100%; max-width: 800px; table-layout: fixed;">
3232
<tr>
@@ -35,8 +35,8 @@ Judgeval offers **open-source tooling** for tracing and evaluating autonomous, s
3535
<br><strong>🤖 Agents Running</strong>
3636
</td>
3737
<td align="center" style="padding: 8px; width: 50%;">
38-
<img src="assets/trace.gif" alt="Trace Demo" style="width: 100%; max-width: 350px; height: auto;" />
39-
<br><strong>📊 Real-time Tracing</strong>
38+
<img src="assets/trace.gif" alt="Capturing Environment Data Demo" style="width: 100%; max-width: 350px; height: auto;" />
39+
<br><strong>📊 Capturing Environment Data </strong>
4040
</td>
4141
</tr>
4242
<tr>
@@ -77,51 +77,11 @@ export JUDGMENT_ORG_ID=...
7777

7878
**If you don't have keys, [create an account](https://app.judgmentlabs.ai/register) on the platform!**
7979

80-
## 🏁 Quickstarts
81-
82-
### 🛰️ Tracing
83-
84-
Create a file named `agent.py` with the following code:
85-
86-
```python
87-
from judgeval.tracer import Tracer, wrap
88-
from openai import OpenAI
89-
90-
client = wrap(OpenAI()) # tracks all LLM calls
91-
judgment = Tracer(project_name="my_project")
92-
93-
@judgment.observe(span_type="tool")
94-
def format_question(question: str) -> str:
95-
# dummy tool
96-
return f"Question : {question}"
97-
98-
@judgment.observe(span_type="function")
99-
def run_agent(prompt: str) -> str:
100-
task = format_question(prompt)
101-
response = client.chat.completions.create(
102-
model="gpt-4.1",
103-
messages=[{"role": "user", "content": task}]
104-
)
105-
return response.choices[0].message.content
106-
107-
run_agent("What is the capital of the United States?")
108-
```
109-
You'll see your trace exported to the Judgment Platform:
110-
111-
<p align="center"><img src="assets/online_eval.png" alt="Judgment Platform Trace Example" width="1500" /></p>
112-
113-
114-
[Click here](https://docs.judgmentlabs.ai/documentation/tracing/introduction) for a more detailed explanation.
115-
116-
117-
<!-- Created by https://github.com/ekalinin/github-markdown-toc -->
118-
11980

12081
## ✨ Features
12182

12283
| | |
12384
|:---|:---:|
124-
| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/agent_trace_example.png" alt="Tracing visualization" width="1200"/></p> |
12585
| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/test.png" alt="Evaluation metrics" width="800"/></p> |
12686
| <h3>📡 Monitoring</h3>Get Slack alerts for agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/errors.png" alt="Monitoring Dashboard" width="1200"/></p> |
12787
| <h3>📊 Datasets</h3>Export traces and test cases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |

0 commit comments

Comments
 (0)